ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.56k stars 831 forks source link

A PyTorch Bug in envs.py #226

Closed Shengyu-Feng closed 4 years ago

Shengyu-Feng commented 4 years ago

For these two lines, https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/62e8d5896db155839056deb0fe60e0c05db0bf16/a2c_ppo_acktr/envs.py#L234-L235

should be self.stacked_obs[:, :-self.shape_dim0] = self.stacked_obs[:, self.shape_dim0:].clone() because there's a PyTorch bug here.

ikostrikov commented 4 years ago

Fixed in https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/commit/84a7582477fb0d5c82ad6d850fe476829dddd2e1