PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.53k
stars
832
forks
source link
does mask introduce bias in the gail implementation ? #272
In the current implementation, it is possible that there can be bad transitions in the rollout storage. The feedforward generator does not filter out these bad samples, when sampling imitator experience for gail. Would this hurt the discriminator's performance because at times it will be getting wrong state transition pairs from the imitator ?
In the current implementation, it is possible that there can be bad transitions in the rollout storage. The feedforward generator does not filter out these bad samples, when sampling imitator experience for gail. Would this hurt the discriminator's performance because at times it will be getting wrong state transition pairs from the imitator ?