PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Set drop_last=True only if the dataset size is more than GAIL batch size (otherwise it just returns an empty dataloader). See example below: