Closed satpreetsingh closed 3 years ago
Hey Satpreet,
Thanks for stumbling onto this. Typically, I use PPO to train recurrent agents and as a consequence I haven't touched the off-policy algorithms implemented here in over a year (hence the errors), but I appreciate the opportunity to fix them up again. I've fixed the typos you've described, and have run into even more issues; I will let you know once I have fixed these. Thank you for your interest in r2l!
Thanks for taking the issue up!
Try these hyperparameters:
python3 r2l.py ddpg --arch 'lstm' --env 'Pendulum-v0' --save_actor pendulum.pt --layers 32 --buffer 1000000 --timesteps 1000000 --batch_size 16 --eval_every 5 --updates 10 --iterations 10000 --c_lr 5e-4 --a_lr 3e-4
Hi, I was trying out your implementation of RDPG and couldn't get it to work on Pendulum-v0.
Invoked using following command on Ubuntu w/ Pytorch 1.4.1:
and it produced
Also had to add the following 2 lines to off_policy.py:
Same behavior with --arch lstm
Thanks for looking into this!