toshikwa / gail-airl-ppo.pytorch

PyTorch implementation of GAIL and AIRL based on PPO.
MIT License
164 stars 30 forks source link

Need suggestions for parameter settings #5

Closed nicholas0717 closed 2 years ago

nicholas0717 commented 2 years ago

Hello Watanabe! Appreciate your greating workings which help me a lot. But I met some trouble and helped for your suggestions.

I've trained for 10 millions steps with SAC in Walker2d-v3 and got the model file actor.pth(logs/Walker2d-v3/sac/seed0-20220526-2134/model/step10000000/actor.pth). It worked perfect and the final Return was about 6000.

Then I executed collect_demo.py with --weights acotr.pth, --buffer_size 1000000, --std 0.01 and --p_rand 0.0, getting size1000000_std0.01_prand0.0.pth.

However, I ran train_imitation.py and got a bad result. The Return always and always fluctuated around 300 even though 4 millions steps had gone on. It seemed the Return wouldn't rise any more.

Were my parameter settings wrong?(nums_steps 10 millions, eval_interval 5000, rollout_length 50000 and other params not changed)

I'm puzzled now, and hope for your suggestions. Thank you a lot!

toshikwa commented 2 years ago

Hi @nicholas0717

The default hyperparameters in this repo are set for Hopper, not for Walker. Please read the paper and supplemental to make sure your experimental setup is the same as in the paper. (e.g. nums_steps should be 25 million steps according to the paper)

Also, adding noises to the demo would help such behavior. The paper does not say much about how to make the demo in the paper, so you will have to tune it yourself.

Thanks.

nicholas0717 commented 2 years ago

I'll try it later and hope it successful. Thank you for your reply!

gunnxx commented 2 years ago

@nicholas0717 How was it? I am curious with the result since I am currently working on similar thing right now.

nicholas0717 commented 2 years ago

@nicholas0717 How was it? I am curious with the result since I am currently working on similar thing right now.

I tried to change seed and rollout and add noises to the demo. Unfortunately, the total return in walker2d could only reach about 3000, but the demo's total return was about 6000. I think other hyperparameters also need to be adjusted appropriately, which is a time-consuming job.

Maybe you have some ideas? We can discuss it.