Closed fisherxue closed 3 years ago
Hello, The main trick is the number of workers, how many did you use?
NOTE: from the documentation " If you want to reproduce results from the paper, please use the rl baselines zoo in order to have the correct hyperparameters and at least 8 MPI workers with DDPG."
Using 8 workers on 8-core machine, but it's not learning still (success rate 0.02).
I'm running with: mpirun -np 8 python train.py --algo her --env FetchPickAndPlace-v1 using rl-baselines-zoo.
Ok, so it seems it is not enough for harder env, openai used 19 to produce results in their paper...
From a previous version of the doc (before refactoring): https://github.com/hill-a/stable-baselines/blob/026e0528d968fafdbe2f017676c99cc3d0fbd10d/docs/modules/her.rst
" In order to reproduce the results from Plappert et al. (2018) [..] This will require a machine with sufficient amount of physical CPU cores. In our experiments, we used Azure's D15v2 instances, which have 20 physical cores. We only scheduled the experiment on 19 of those to leave some head-room on the system. "
Thanks, I'll try it with more workers!
However, I also tried it using baselines with this command:
python -m baselines.run --num_env 2 --alg=her --env=FetchPickAndPlace-v1 --num_timesteps=5.0e6
This gave me a success rate of 1:
---------------------------------
| epoch | 677 |
| stats_g/mean | 0.851 |
| stats_g/std | 0.107 |
| stats_o/mean | 0.205 |
| stats_o/std | 0.115 |
| test/episode | 1.36e+04 |
| test/mean_Q | -1.32 |
| test/success_rate | 1 |
| train/episode | 6.78e+04 |
| train/success_rate | 0.59 |
---------------------------------
Meanwhile, with:
mpirun -np 8 python train.py --algo her --env FetchPickAndPlace-v1
I get a 1% success rate.
I'm wondering if the issue may be with different hyperparameter options.
They have:
Actor and critic networks: 3 layers with 256units each and ReLU non-linearities
Adam optimizer (Kingma and Ba, 2014) with 1·10e^-3 for training both actor and critic
Buffer size: 10^6 transitions
Polyak-averaging coefficient:0.95***
Action L2 norm coefficient:1.0
Observation clipping:[−200,200]
Batch size:256
Rollouts per MPI worker:2***
Number of MPI workers:19
Cycles per epoch:50
Batches per cycle:40
Test rollouts per epoch:10
Probability of random actions:0.3
Scale of additive Gaussian noise:0.2
Probability of HER experience replay:0.8
Normalized clipping:[−5,5]
*** are the ones I'm not sure how to implement in stable baselines.
Yes, they used a custom ddpg version and it s hard to make the correspondance (cf issue you mentioned at the beginning). However, if you find a bug or the trick that makes it work with less workers then we would be happy to add that.
Edit: the success rate displayed for sb is the train one, the test one should be higher
Update: trained with 20 workers, but still no luck (-50 reward, fails >99% of the time during test). Any tips?
I suspect either a subtle bug in the implementation (because the implementation still works on other envs) or the additional tricks/hyperparameters of openai that make it work. One that was not implemented is the l2 penalty on the action. They also use a different formulation of HER, creating new transitions only when sampling.
If you have some time, the best way would be to remove each trick from the OpenAI baselines repo until it breaks (unfortunately, I don't have the resources to do that on my own...)
Will do, I'll get back to you in a few weeks (hardware I would run it on is currently occupied) :)
@araffin quick question, why is the test success rate higher than the train? Doesn't this mean there is sampling bias in your test?
Edit: the success rate displayed for sb is the train one, the test one should be higher
Thank you in advance!
During testing, all the exploration noise is removed, we use a deterministic policy, hence the difference.
I'm curious if anyone ever found the solution to this. I'm working on a very similar environment to Fetch and getting about 80% success in the regular baselines library, but 0 with stable baselines. I was wondering if it has to do with VecNormalize not being implemented in HER or something entirely different.
An update here: it seems that HER + SAC is working (with only worker) on Fetch Pick and Place (and others), you can find trained agents in the rl zoo See PR https://github.com/araffin/rl-baselines-zoo/pull/53
Fixed in SB3, results and hyperparameters are available in the zoo: https://github.com/DLR-RM/rl-baselines3-zoo
I am trying to train FetchPickAndPlace as per https://arxiv.org/pdf/1802.09464.pdf using DDPG+HER, however, regardless of how long I train, agent fails to learn anything. I saw that #198 mentioned that OpenAI used a number of tricks to get it to work. Has anyone had any luck doing so in stable baselines? Thanks!
FetchReach and FetchPush both train fine.
My current hyperparameters: