Closed Anya28 closed 5 years ago
Have you tried A2C? It usually works better than A3C and doesn't have these sort of thread safety issues.
We should probably fix this though, I think it's either a thread safety bug in plasma, or we forgot to grab a lock when accessing some PyTorch state.
Have you tried A2C? It usually works better than A3C and doesn't have these sort of thread safety issues. We should probably fix this though, I think it's either a thread safety bug in plasma, or we forgot to grab a lock when accessing some PyTorch state.
I have just tried it now, it is stable comparing to pytorch A3C. However it is a synchronous implementation, so depending on the variance in environment, it can be very slow. Does it mean that for pytorch it is not possible to use asynchronous update atm?
As to the locks, I also have a question on that. In the torch_policy_graph, the lock from python threading module is put in every function. The reasoning for that, that I saw in the PR comments, is that "it makes the (specifically) Pytorch code more stable etc". However, I would like to understand why lock is especially important for Pytorch, and how would that work with the idea of asynchronous update for the algorithm? Once one thread grabs a lock, the others will be blocked till the lock is released, and so as the number of threads is increasing, having a lock in each computational function would make the implementation progressively slower?
Thank you for your replies!
I have just tried it now, it is stable comparing to pytorch A3C. However it is a synchronous implementation, so depending on the variance in environment, it can be very slow.
That's true, though if you have a GPU A2C can be much faster anyways. For example, in the benchmarks here A2C gets twice the Qbert reward in 1 hour, with less workers: https://github.com/ray-project/rl-experiments#impala-and-a2c
Does it mean that for pytorch it is not possible to use asynchronous update atm?
By the way, I noticed you're using a quite old version of Ray, it's likely upgrading will help. There have been a number of plasma bug fixes since then.
Once one thread grabs a lock, the others will be blocked till the lock is released, and so as the number of threads is increasing, having a lock in each computational function would make the implementation progressively slower?
The lock will add overhead within each worker, however RLlib scales by adding multiple worker processes. So the lock overhead does not get worse with increased parallelism.
@ericl any progress with AsyncSampler and PyTorch compatibility? Can you elaborate on what's the issue?
System information
Describe the problem
I have tried running the pytorch version of A3C algorithm, keeping all the configs as they are and just changing the flag "use_pytorch" to True. Varying the number of workers and environments (CartPole, Qbert, SpaceInvaders), in most of the cases pytorch A3C would start running and after a while would crash before the end of the experiment, with the error that the "worker died or was killed". Sometimes, however, it would run till the end of the experiment (reach 500000 steps in my case). Has anyone else come across this issue? Thank you!