Open shangdongyang opened 1 year ago
@Juno-T any idea?
I believe that the Gymnasium's PickAndPlace-v2
is not exactly what was used in the paper.
From HER paper:
We consider 3 different tasks: ...
- Pick-and-place. This task is similar to pushing but the target position is in the air and the fingers are not locked. To make exploration in this task easier we recorded a single state in which the box is grasped and start half of the training episodes from this state[2]. ... [2]This was necessary because we could not successfully train any policies for this task without using the demonstration state. We have later discovered that training is possible without this trick if only the goal position is sometimes on the table and sometimes in the air.
If you want to replicate the paper, you might need to modify the initial state of this task.
If you want to solve this task, I would suggest trying FetchPickAndPlaceDense-v2
to see if it's actually an exploration problem or not. (It probably is, imo) And then try combining HER with some exploration algorithm, e.g. ICM/RND etc.
Thank you very much for your reply. I switched to Mujoco Pro 150 and Mujoco_ Py=1.50.1.68, setting PickAndPlace in Gymnasium robotics to PickAndPlace v1, and using SAC her and DDPG her in Tianshou=0.5.1, the optimal strategy cannot be learned. Based on this issue( https://github.com/thu-ml/tianshou/pull/510 )HER implementation seems to be able to learn the optimal strategy in PickAndPlace v1.
Hi! I met the same error when I tried to replicate the paper in FetchPickAndPlaceDense-v2
env, i.e., the training process cannot converge.
@shangdongyang Did you finally get a convergent training process?
@Juno-T I'm trying to follow your instructions, but I don't quite get some of your points.
What does
you might need to modify the initial state of this task
mean? From the code snippet in FetchPickAndPlaceDense-v2
, it seems that the goal has always been randomly sampled with a certain height from the table. Should I change the sample process to get some goals on the table?
And then try combining HER with some exploration algorithm, e.g. ICM/RND etc.
And, are there any examples of combining these algorithms or finishing the training process convergently with the Tianshou API?
I'm still at the beginning stage of learning RL. Thanks to both of you for any help!
Related to #935
0.5.1 0.28.1 1.12.0+cu116 1.21.6 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0] linux GPU available: True
Thank you very much for providing support for the HER algorithm in the off policy RL algorithm and providing sample code for ddpg-her in the fetch reach environment. However, I switched the code to PickAndPlace-v2 in the fetch and ran ddpg-her. The algorithm has already run 4M steps, but it cannot converge, which is not consistent with the 2M step convergence in the HER original paper in 2017. I don't know where the problem occurred.