PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
In VecPyTorch::step_async the input is checked for a Tensor ( for discrete actions). Include cuda tensors as well
e.g.
if isinstance(actions, (torch.LongTensor, torch.cuda.LongTensor) ):