Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Number of cpus are ray.init(num_cpus=4).
Default is raw Python implementation, Numba is using only JIT (compile by pseudo input before counting time), Ray is paralleling the function, and Numba + Ray is paralleling the JIT function.
We can see that Numba is working fine (extremely fast after compilation), Ray is also working fine (obviously scale down to 1/4 with some overhead).
However, Numba + Ray is not working as expected since it is extremely slow, but it should be better than Numba only.
I think the reason is that Numba compiles the JIT function for every Ray process.I wonder if it is possible to compile JIT function only once before the Ray processes starts.
(I have tried to put the JIT function as a ref objectm but it has no influence.)
I am aware of ramba (a library combining Ray and Numba), but it seems that it is only a NumPy replacement with parallelism.
Here is the testing code I use.
It is implementation for multiple sampling from given HMM model.
It can further improve the efficency per Ray process.
Also, JIT is becoming popular in near recent as ML efficency improvement toolkit, and it will be nice for Ray to have a better support with JIT.
Hey OP did you ever figure out a mitigation to this? This is a huge issue for us, we're seeing 50x slowdowns using Ray despite the improvements in parallelism.
Search before asking
Description
Ray remote becomes extremely slow when I have just-in-time functions (by Numba) inside. The time costs of my testing code (attached later) are:
Number of cpus are
ray.init(num_cpus=4)
. Default is raw Python implementation, Numba is using only JIT (compile by pseudo input before counting time), Ray is paralleling the function, and Numba + Ray is paralleling the JIT function. We can see that Numba is working fine (extremely fast after compilation), Ray is also working fine (obviously scale down to 1/4 with some overhead). However, Numba + Ray is not working as expected since it is extremely slow, but it should be better than Numba only. I think the reason is that Numba compiles the JIT function for every Ray process. I wonder if it is possible to compile JIT function only once before the Ray processes starts. (I have tried to put the JIT function as a ref objectm but it has no influence.) I am aware oframba
(a library combining Ray and Numba), but it seems that it is only a NumPy replacement with parallelism.Here is the testing code I use. It is implementation for multiple sampling from given HMM model.
Use case
It can further improve the efficency per Ray process. Also, JIT is becoming popular in near recent as ML efficency improvement toolkit, and it will be nice for Ray to have a better support with JIT.
Related issues
No response
Are you willing to submit a PR?