tpbarron / pytorch-ppo

Proximal Policy Optimization in PyTorch
MIT License
38 stars 11 forks source link

tried on Pendulum-v0 #1

Closed freddycct closed 7 years ago

freddycct commented 7 years ago

doesn't work...

freddycct commented 7 years ago

after a long time, it finally converged 👍

tpbarron commented 7 years ago

Right now only one update step is taken for each batch of data. The baselines code does 10 epochs I believe. That should make it much faster. I'll probably get to that in the next few days.