siekmanj / r2l

Recurrent continuous reinforcement learning algorithms implemented in Pytorch.
Creative Commons Zero v1.0 Universal
50 stars 5 forks source link

Is RDPG working? #3

Closed satpreetsingh closed 3 years ago

satpreetsingh commented 3 years ago

Hi, I was trying out your implementation of RDPG and couldn't get it to work on Pendulum-v0.

Invoked using following command on Ubuntu w/ Pytorch 1.4.1:

python3 r2l.py ddpg --env 'Pendulum-v0' --arch gru --save_actor pendulum.pt --layers 32 --timesteps 10000

and it produced

Recurrent Reinforcement Learning for Robotics.
Deep Deterministic Policy Gradient
        env:            Pendulum-v0
        seed:           0
        timesteps:      10,000
        actor_lr:       1e-05
        critic_lr:      0.0001
        discount:       0.99
        tau:            0.01
        batch_size:     64
        warmup period:  10,000

[... some warnings ...]

Logging to ./logs/ddpg/Pendulum-v0/3a716b-seed0
Buffer full.

Also had to add the following 2 lines to off_policy.py:

  from policies.critic import GRU_Q
  from policies.actor import GRU_Actor

Same behavior with --arch lstm

Thanks for looking into this!

siekmanj commented 3 years ago

Hey Satpreet,

Thanks for stumbling onto this. Typically, I use PPO to train recurrent agents and as a consequence I haven't touched the off-policy algorithms implemented here in over a year (hence the errors), but I appreciate the opportunity to fix them up again. I've fixed the typos you've described, and have run into even more issues; I will let you know once I have fixed these. Thank you for your interest in r2l!

satpreetsingh commented 3 years ago

Thanks for taking the issue up!

siekmanj commented 3 years ago

Try these hyperparameters: python3 r2l.py ddpg --arch 'lstm' --env 'Pendulum-v0' --save_actor pendulum.pt --layers 32 --buffer 1000000 --timesteps 1000000 --batch_size 16 --eval_every 5 --updates 10 --iterations 10000 --c_lr 5e-4 --a_lr 3e-4