ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.56k stars 831 forks source link

bug when set num_processes=1 #231

Open KK666-AI opened 4 years ago

KK666-AI commented 4 years ago

Dear Author,

When I set num_processes=1, some errors occur, i could be a bug.

File "/home/ken/project/pytorch-a2c-ppo-acktr-gail/a2c_ppo_acktr/envs.py", line 232, in step_wait self.stacked_obs[:, self.shape_dim0:] RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor refer to a single memory location. Please clone() the tensor before performing the operation.

ikostrikov commented 4 years ago

Fixed in https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/commit/84a7582477fb0d5c82ad6d850fe476829dddd2e1

KK666-AI commented 4 years ago

Thanks. It works.

But it seems the wrapped environment with num_processes=1 is much slower than the original gym environment.

ikostrikov commented 4 years ago

@lihuiknight can you provide parameters that you use and specs of your hardware?

KK666-AI commented 4 years ago

I test this program on ubuntu 18.04, cpu=intel i5, 8 CPUs. I compare the running time on CartPole-v1 between standard gym and the wrapped gym. The wrapped gym is much slower than standard gym.