ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.35k stars 5.65k forks source link

Pytorch A3C - unstable runs #3994

Closed Anya28 closed 5 years ago

Anya28 commented 5 years ago

System information

Describe the problem

I have tried running the pytorch version of A3C algorithm, keeping all the configs as they are and just changing the flag "use_pytorch" to True. Varying the number of workers and environments (CartPole, Qbert, SpaceInvaders), in most of the cases pytorch A3C would start running and after a while would crash before the end of the experiment, with the error that the "worker died or was killed". Sometimes, however, it would run till the end of the experiment (reach 500000 steps in my case). Has anyone else come across this issue? Thank you!

ericl commented 5 years ago

Have you tried A2C? It usually works better than A3C and doesn't have these sort of thread safety issues.

ericl commented 5 years ago

We should probably fix this though, I think it's either a thread safety bug in plasma, or we forgot to grab a lock when accessing some PyTorch state.

Anya28 commented 5 years ago

Have you tried A2C? It usually works better than A3C and doesn't have these sort of thread safety issues. We should probably fix this though, I think it's either a thread safety bug in plasma, or we forgot to grab a lock when accessing some PyTorch state.

I have just tried it now, it is stable comparing to pytorch A3C. However it is a synchronous implementation, so depending on the variance in environment, it can be very slow. Does it mean that for pytorch it is not possible to use asynchronous update atm?

As to the locks, I also have a question on that. In the torch_policy_graph, the lock from python threading module is put in every function. The reasoning for that, that I saw in the PR comments, is that "it makes the (specifically) Pytorch code more stable etc". However, I would like to understand why lock is especially important for Pytorch, and how would that work with the idea of asynchronous update for the algorithm? Once one thread grabs a lock, the others will be blocked till the lock is released, and so as the number of threads is increasing, having a lock in each computational function would make the implementation progressively slower?

Thank you for your replies!

ericl commented 5 years ago

I have just tried it now, it is stable comparing to pytorch A3C. However it is a synchronous implementation, so depending on the variance in environment, it can be very slow.

That's true, though if you have a GPU A2C can be much faster anyways. For example, in the benchmarks here A2C gets twice the Qbert reward in 1 hour, with less workers: https://github.com/ray-project/rl-experiments#impala-and-a2c

Does it mean that for pytorch it is not possible to use asynchronous update atm?

By the way, I noticed you're using a quite old version of Ray, it's likely upgrading will help. There have been a number of plasma bug fixes since then.

Once one thread grabs a lock, the others will be blocked till the lock is released, and so as the number of threads is increasing, having a lock in each computational function would make the implementation progressively slower?

The lock will add overhead within each worker, however RLlib scales by adding multiple worker processes. So the lock overhead does not get worse with increased parallelism.

roireshef commented 5 years ago

@ericl any progress with AsyncSampler and PyTorch compatibility? Can you elaborate on what's the issue?