PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
I have a question that i found the observation will be reset instantly when the done is true.
So when the done is true and the corresponding rewards and observation is wrong.
will this make the problem when training convolution neural network?
I have a question that i found the observation will be reset instantly when the done is true. So when the done is true and the corresponding rewards and observation is wrong. will this make the problem when training convolution neural network?