Open slDeng1003 opened 4 months ago
Hello! I think the training code is logically the same as OpenAI's.
Maybe you are misled by these two similar loops: https://github.com/openai/spinningup/blob/038665d62d569055401d91856abb287263096178/spinup/algos/pytorch/ppo/ppo.py#L299 and https://github.com/nikhilbarhate99/PPO-PyTorch/blob/728cce83d7ab628fe2634eabcdf3239997eb81dd/train.py#L173 In the former (OpenAI's) implementation, this loop will perform more than one episode, and it calls reset when an episode is done (but not jump out the loop). In the latter (this repo's) implementation, the loop performs only one episode. When an episode is done, it breaks the loop and resets the env (before the next episode begins).
Hope it makes scene to you!
Dear Huang, I appreciate your reply. I have checked the code and find out that you are right. Thank you again for your help!👍 @ZheruiHuang
【Existing code:】 Only reset the environment at the beginning of training loop, that is, only call env.reset() at the first epoch. 【Right(might) training paradigm】 I checked OpenAI spinning-up's implement of PPO https://github.com/openai/spinningup/blob/master/spinup/algos/pytorch/ppo/ppo.py, they do reset the env at the end of each epoch (same as reset it at the beginning of each epoch).
Correct me if I were wrong:)
P.S.: It;s still nice code!