ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.53k stars 832 forks source link

does mask introduce bias in the gail implementation ? #272

Open HareshKarnan opened 3 years ago

HareshKarnan commented 3 years ago

In the current implementation, it is possible that there can be bad transitions in the rollout storage. The feedforward generator does not filter out these bad samples, when sampling imitator experience for gail. Would this hurt the discriminator's performance because at times it will be getting wrong state transition pairs from the imitator ?