SurRol RL setting merge

Choirong commented 1 year ago

Hello,

I've got trouble to make SurRol RL env setting. First of all, I was deeply impressed with your SurRol projects.

Here is my problem. I can execute 'needle_pick' and 'needle_regrasp' tasks easily. Then, I want to merge them and learn with Reinforcement Learning.

I divided needle to 18 parts to grab.(This is pybullet env and I pressed 'w' and 'a' key when it execute)

And I want to make,

PSM1 grasp needle random point of 18 parts
PSM2 regrasp needle random point with RL with stable_baseline3(Cause baseline in github service has been terminated.)

I appreciate if you give me an advice for me. Also, I sent you an email for details.

Best regards, Donghyeon Choi

TaoHuang13 commented 1 year ago

Hi Choi. Thanks for your interest in SurRoL.

I am sorry that I cannot catch the problems you have faced with. Are you troubled by the RL training or environment building?

Choirong commented 1 year ago

Thank you @TaoHuang13.

I tried to use OpenAI baselines. But it is build failing now. OpenAI noticed that it is cause expect bug fixes and minor updates. Here is OpenAI baselines github : https://github.com/openai/baselines

For instead, I applied stable-baselines3 for RL training with SurRol. Here is stable-baselines3 github :https://github.com/DLR-RM/stable-baselines3

I started RL training using PPO algorithm in NeedlePick environment. Here is the code :

from surrol.tasks.needle_pick import NeedlePick
from stable_baselines3 import PPO

env = NeedlePick(render_mode='human')
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(total_timesteps=100000)
# model.save("needle_pick")
obs = env.reset()

for i in range(10):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)
    env.render()
    if done:
      obs = env.reset()

env.close()

I expected env reset on every episodes when I start to model.learn it. PSM1(one of dvrk arm) moves and trying to find needle but needle doesn't move.(In your code, needle got random position with uniform random function) So, it looks like one episode.

How was your RL training with OpenAI baselines? It works well?
Should I have to change compute_reward() function? Cause this code reward is just 0 or 1.
While RL training for PSM1, it doesn't pick needle when reach needle. It just hit the needle.

Could you give me a hand?

TaoHuang13 commented 1 year ago

Hi @Choirong. Thanks for providing us with the details.

OpenAI Baselines works well in most tasks. Please see Figure 5 in the SurRoL paper. Actually, we did not meet any big bugs when using Baselines. Meanwhile, we have also tested Stable Baselines3 (SB3). It has two drawbacks compared with Baselines: 1) It does not implement HER+DEMO algorithm, which we adopt to solve complex tasks like PegTransfer; 2) it has some different implementation of DRL tricks like state normalization. These make their DRL algorithm less sample-inefficient in some tasks. For example, HER from SB3 requires ~30 epochs to learn NeedleReach, while HER from Baselines only needs ~5 epochs.
Classic DRL algorithms like PPO and SAC fail to address PSM tasks, since these methods need largely many samples to learn good policy given the sparse reward unction (or 0/1 reward function as you said). That’s why we use HER and HER+DEMO algorithms to solve PSM tasks. If you intend to use PPO to solve PSM tasks, designing a suitable reward function is a promising way. However, we currently do not consider this in SurRoL because it requires heavy handcrafted efforts, which will be exacerbated when a task gets more complicated.
Refer to 2. Using PPO is unable to learn sophisticated skills successfully due to the sparse reward. We highly recommend you to use HER and HER+DEMO from Baselines to solve complex tasks.

Please feel free to let us know if you encounter any further issues.

Choirong commented 1 year ago

Thank you @TaoHuang13.

I read SurRol paper to check what you said. And also, I'm going to try OpenAI Baselines and HER+DEMO algorithm(Thanks for your recommend to solve complex tasks).

I appreciate for your prompt reply.

Best regards, Donghyeon Choi

med-air / SurRoL

SurRol RL setting merge #4