PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Hi,
I just feel this is an issue, that when done variable is true, the env.reset is not called, which can cause problems if I'm not wrong, unless problem is infinite horizon and every state can visited with non-zero probability.
Hi, I just feel this is an issue, that when done variable is true, the env.reset is not called, which can cause problems if I'm not wrong, unless problem is infinite horizon and every state can visited with non-zero probability.