using PPO implementation in custom environement

Hi, thank you for writing this code I found it extremely helpful as a beginner. I have been using this implementation in a custom environment and I had a general question.

One of the hyperparameters is n_steps, number of steps to run for each environment per update. I was wondering if there is an inherent issue if my custom environment has maximum 250 steps and loses reward for the time that passes.

Can this create a conflict and will it not learn as well? I hope my question makes sense. Please do let me know.

vwxyzjn / ppo-implementation-details

using PPO implementation in custom environement #2