ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.57k stars 829 forks source link

Environment not being reset in main.py #218

Closed sandeepnRES closed 4 years ago

sandeepnRES commented 4 years ago

Hi, I just feel this is an issue, that when done variable is true, the env.reset is not called, which can cause problems if I'm not wrong, unless problem is infinite horizon and every state can visited with non-zero probability.

kyonofx commented 4 years ago

162 Does this answer your question?