Closed zsano1 closed 3 years ago
Also, for hopper.yaml and walker2d.yaml in your example, I guess the max_steps
should be 1000?
Hi, thanks for checking that out. Yeah I will fix the max_steps in hopper and walker. What about the default implementation, i.e.
python examples/run.py agent SAC --env-config ./examples/config/envs/half-cheetah.yaml --num-train 300
I am checking out the agent config because the default SAC performs well. Could you double-check? Thanks
Thank you for your reply! I double-checked and found that default SAC agent works well in the Halfcheetah environment! But it still gives low training return in the hopper and ant environments. Would you suggest the proper settings that work in these tasks? Or other locomotion tasks that the default agents perform well? Thanks!
Hi that is good to know. I'm not sure, maybe check at this other repo to see the parameters they use there. Also I believe that 300 episodes is harder for these environments due to early termination. Usually you want something larger > 10000 or so episodes.
Thanks! I'll give it a try.
Hi, Thanks for your PyTorch version rllib code! However, when I run your SAC example by, eg,
python examples/run.py agent SAC --agent-config ./examples/config/agents/sac.yaml --env-config ./examples/config/envs/half-cheetah.yaml --num-train 300
, the performance is low (lower than 1000). And the return is also low when training the PPO agent and training in other mujoco environments (I tested ant, hopper, halfcheetah).