vwxyzjn / ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
Other
637 stars 99 forks source link

using PPO implementation in custom environement #2

Open chaubeyniha opened 1 year ago

chaubeyniha commented 1 year ago

Hi, thank you for writing this code I found it extremely helpful as a beginner. I have been using this implementation in a custom environment and I had a general question.

One of the hyperparameters is n_steps, number of steps to run for each environment per update. I was wondering if there is an inherent issue if my custom environment has maximum 250 steps and loses reward for the time that passes.

Can this create a conflict and will it not learn as well? I hope my question makes sense. Please do let me know.