ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.53k stars 832 forks source link

observation reset before insert #274

Open seed851218 opened 3 years ago

seed851218 commented 3 years ago

I have a question that i found the observation will be reset instantly when the done is true. So when the done is true and the corresponding rewards and observation is wrong. will this make the problem when training convolution neural network?