Closed edbeeching closed 6 years ago
Yes, but I see you call the ensure_shared_grads function after backprop but before the optimizer step, is this still required? To be honest I am still a bit surprised / impressed that asynchronous updates are so straight forward in PyTorch. Essentially I am implementing a A3C RL agent and want to be sure that all processes are updating the global model, I was following the Hogwild example and your code but I noticed this difference.
Cheers
When I remove these lines it stops working for me. Probably, it requires to change something else.
I think these lines in train are no longer required: 24 model = ActorCritic(env.observation_space.shape[0], env.action_space) 38 model.load_state_dict(shared_model.state_dict())
and also change all instances of model to shared_model
At least that is how it appears to work for the Hogwild example. It is tricky because this would mean that during an episode the model can potentially change its parameters due to updates from the other processes.
Perhaps the original approach is better as there are advantages and disadvantages to the changes (assuming they would work).
Sorry for the hassle!
These lines are important because we need to keep parameters during roll outs.
Probably, I will have to spend more time trying to figure out what they changed.
In meanwhile I would highly recommend to use A2C/PPO/ACKTR. In my experience, it works just as well (except some cases).
Ok thank you, I will defer to your expertise and try A2C for the moment. I have a more general question if you have the time, why do you think it is better to keep the parameters the same during the rollout? Because of the GAE?
It seems the PyTorch Mnist hogwild example has been updated now, as gradients are now allocated lazily. I think this means that this part of you code is no longer required?