ikostrikov / pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
MIT License
1.23k stars 280 forks source link

is ensure_shared_grads still required? #49

Closed edbeeching closed 6 years ago

edbeeching commented 6 years ago

It seems the PyTorch Mnist hogwild example has been updated now, as gradients are now allocated lazily. I think this means that this part of you code is no longer required?

ikostrikov commented 6 years ago

Do you mean this commit?

https://github.com/pytorch/examples/commit/5c41070628f51497d15241c859eec80f4b05d90b#diff-054e0153cf9e86773f4d272224dd1d13

edbeeching commented 6 years ago

Yes, but I see you call the ensure_shared_grads function after backprop but before the optimizer step, is this still required? To be honest I am still a bit surprised / impressed that asynchronous updates are so straight forward in PyTorch. Essentially I am implementing a A3C RL agent and want to be sure that all processes are updating the global model, I was following the Hogwild example and your code but I noticed this difference.

Cheers

ikostrikov commented 6 years ago

When I remove these lines it stops working for me. Probably, it requires to change something else.

edbeeching commented 6 years ago

I think these lines in train are no longer required: 24 model = ActorCritic(env.observation_space.shape[0], env.action_space) 38 model.load_state_dict(shared_model.state_dict())

and also change all instances of model to shared_model

At least that is how it appears to work for the Hogwild example. It is tricky because this would mean that during an episode the model can potentially change its parameters due to updates from the other processes.

Perhaps the original approach is better as there are advantages and disadvantages to the changes (assuming they would work).

Sorry for the hassle!

ikostrikov commented 6 years ago

These lines are important because we need to keep parameters during roll outs.

Probably, I will have to spend more time trying to figure out what they changed.

In meanwhile I would highly recommend to use A2C/PPO/ACKTR. In my experience, it works just as well (except some cases).

edbeeching commented 6 years ago

Ok thank you, I will defer to your expertise and try A2C for the moment. I have a more general question if you have the time, why do you think it is better to keep the parameters the same during the rollout? Because of the GAE?