sebascuri / rllib

MIT License
17 stars 9 forks source link

low training return of the example agents #2

Closed zsano1 closed 3 years ago

zsano1 commented 3 years ago

Hi, Thanks for your PyTorch version rllib code! However, when I run your SAC example by, eg,python examples/run.py agent SAC --agent-config ./examples/config/agents/sac.yaml --env-config ./examples/config/envs/half-cheetah.yaml --num-train 300, the performance is low (lower than 1000). And the return is also low when training the PPO agent and training in other mujoco environments (I tested ant, hopper, halfcheetah).

zsano1 commented 3 years ago

Also, for hopper.yaml and walker2d.yaml in your example, I guess the max_steps should be 1000?

sebascuri commented 3 years ago

Hi, thanks for checking that out. Yeah I will fix the max_steps in hopper and walker. What about the default implementation, i.e. python examples/run.py agent SAC --env-config ./examples/config/envs/half-cheetah.yaml --num-train 300

I am checking out the agent config because the default SAC performs well. Could you double-check? Thanks

zsano1 commented 3 years ago

Thank you for your reply! I double-checked and found that default SAC agent works well in the Halfcheetah environment! But it still gives low training return in the hopper and ant environments. Would you suggest the proper settings that work in these tasks? Or other locomotion tasks that the default agents perform well? Thanks!

sebascuri commented 3 years ago

Hi that is good to know. I'm not sure, maybe check at this other repo to see the parameters they use there. Also I believe that 300 episodes is harder for these environments due to early termination. Usually you want something larger > 10000 or so episodes.

zsano1 commented 3 years ago

Thanks! I'll give it a try.