About the HER implementation in PickAndPlace-v2 environment

thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.

https://tianshou.org

MIT License

7.91k stars 1.12k forks source link

About the HER implementation in PickAndPlace-v2 environment #862

Open shangdongyang opened 1 year ago

shangdongyang commented 1 year ago

[x] I have marked all applicable categories:
- [x] exception-raising bug
- [x] RL algorithm bug
- [x] documentation request (i.e. "X is missing from the documentation.")
- [x] new feature request
[x] I have visited the source website
[x] I have searched through the issue tracker for duplicates
[x] I have mentioned version numbers, operating system and environment, where applicable:
```
import tianshou, gymnasium as gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
```
0.5.1 0.28.1 1.12.0+cu116 1.21.6 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0] linux GPU available: True

Thank you very much for providing support for the HER algorithm in the off policy RL algorithm and providing sample code for ddpg-her in the fetch reach environment. However, I switched the code to PickAndPlace-v2 in the fetch and ran ddpg-her. The algorithm has already run 4M steps, but it cannot converge, which is not consistent with the 2M step convergence in the HER original paper in 2017. I don't know where the problem occurred.

Trinkle23897 commented 1 year ago

@Juno-T any idea?

Juno-T commented 1 year ago

I believe that the Gymnasium's PickAndPlace-v2 is not exactly what was used in the paper.

From HER paper:

We consider 3 different tasks: ...

Pick-and-place. This task is similar to pushing but the target position is in the air and the fingers are not locked. To make exploration in this task easier we recorded a single state in which the box is grasped and start half of the training episodes from this state[2]. ... [2]This was necessary because we could not successfully train any policies for this task without using the demonstration state. We have later discovered that training is possible without this trick if only the goal position is sometimes on the table and sometimes in the air.

If you want to replicate the paper, you might need to modify the initial state of this task. If you want to solve this task, I would suggest trying FetchPickAndPlaceDense-v2 to see if it's actually an exploration problem or not. (It probably is, imo) And then try combining HER with some exploration algorithm, e.g. ICM/RND etc.

shangdongyang commented 1 year ago

Thank you very much for your reply. I switched to Mujoco Pro 150 and Mujoco_ Py=1.50.1.68, setting PickAndPlace in Gymnasium robotics to PickAndPlace v1, and using SAC her and DDPG her in Tianshou=0.5.1, the optimal strategy cannot be learned. Based on this issue（ https://github.com/thu-ml/tianshou/pull/510 ）HER implementation seems to be able to learn the optimal strategy in PickAndPlace v1.

zichunxx commented 1 year ago

Hi! I met the same error when I tried to replicate the paper in FetchPickAndPlaceDense-v2 env, i.e., the training process cannot converge.

@shangdongyang Did you finally get a convergent training process?

@Juno-T I'm trying to follow your instructions, but I don't quite get some of your points.

What does

you might need to modify the initial state of this task

mean? From the code snippet in FetchPickAndPlaceDense-v2, it seems that the goal has always been randomly sampled with a certain height from the table. Should I change the sample process to get some goals on the table?

And then try combining HER with some exploration algorithm, e.g. ICM/RND etc.

And, are there any examples of combining these algorithms or finishing the training process convergently with the Tianshou API?

I'm still at the beginning stage of learning RL. Thanks to both of you for any help!

MischaPanch commented 1 year ago

Related to #935