ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.57k stars 829 forks source link

Why save the entire model, rather than its state_dict()? #169

Closed Akella17 closed 5 years ago

Akella17 commented 5 years ago

Is there any reason why we are saving the entire model, rather than its state_dict()? Also, why do we create a CPU copy of the CUDA actor-critic network before saving it (line: 152, copy.deepcopy(actor_critic).cpu(), main.py)

Also, how do I call the PyBullet (Racecar, Minitaur and Kuka), gym-Classic Control (Pendulum v0), Roboschool envs?

ikostrikov commented 5 years ago

There were no particular reasons.

Models are saved to cpu because it was impossible to load them on cpu only machines when the models were saved on gpu.

Please see how the environments were registered for gym in the corresponding repositories: https://github.com/bulletphysics/bullet3/blob/554208c98de4c87b4f5b97e912fcc52699006c05/examples/pybullet/gym/pybullet_envs/__init__.py#L11