Are these arguments correct? Or is it very hard to train MountainCar-v0 by PPO?

vwxyzjn / ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Other

637 stars 99 forks source link

Open alanyuwenche opened 10 months ago

alanyuwenche commented 10 months ago

I applied the code to train MountainCar-v0 but failed after 10 million timesteps. The command is as follows.

!python ppo.py --gym-id MountainCar-v0 --total-timesteps 10000000

ret

Are these arguments correct? Or is it very hard to train MountainCar-v0 by PPO?