ebm does not use more than 16 cores

interpretml / interpret

Fit interpretable models. Explain blackbox machine learning.

https://interpret.ml/docs

MIT License

6.28k stars 729 forks source link

ebm does not use more than 16 cores #158

Closed tmontana closed 2 years ago

tmontana commented 4 years ago

Hi. in my experiments with a 48 core machine EBM only uses 16 cores regardless of the data size. n_jobs is set at -2.

Thanks

tmontana commented 4 years ago

same result if n_jobs is set at 48 (stuck at 16 cores)

interpret-ml commented 4 years ago

Hi @tmontana --

If you have that many cores, you can get better models at no extra cost by setting outer_bags to something in the range of 48, which would also get you full utilization. We currently scale by the number of outer bags. Finer grained scaling will require some re-architecting on our part. It's in our backlog, but since most people don't have enough cores to benefit, and considering the complexity of implementation, this is still a ways out.

-InterpretML team

tmontana commented 4 years ago

Understood. Thank you,

tmontana commented 4 years ago

Hi. Just FYI I am running some experiments on a 96 core machine with 185g in ram. My dataset is 1.2mm rows and 325 columns. I use 40% as validation. If I set outer bags to anything above 35 the kernel will crash with out of memory message. Am I correct in assuming that the data is copied across each core?

If that's the case then just wanted to point out I've had good luck using Ray for parallelism. Ray has shared object storage so does not run into memory issues as you increase the number of cores. Do you think this could be feasible for EBM based on below link?
https://docs.ray.io/en/latest/joblib.html

Thanks

interpret-ml commented 4 years ago

Hi @tmontana -- Yes! You are correct regarding the way data is currently being copied to each core. This is a priority for us to resolve, but has been a bit delayed due to work on other areas. It's next on the list though as several people have had this issue. We were planning to use a RawArray to share this memory across processes. Quite a bit of the package is in C++, so the plan is to allocate the RawArray in python, and then fill it in C++ using a compressed representation. At the end we should then have a single shared memory object which we can pass around to different processes in python, and that's even smaller than it is currently for a single process.

-InterpretML

ZhangTP1996 commented 3 years ago

Hi @tmontana -- Yes! You are correct regarding the way data is currently being copied to each core. This is a priority for us to resolve, but has been a bit delayed due to work on other areas. It's next on the list though as several people have had this issue. We were planning to use a RawArray to share this memory across processes. Quite a bit of the package is in C++, so the plan is to allocate the RawArray in python, and then fill it in C++ using a compressed representation. At the end we should then have a single shared memory object which we can pass around to different processes in python, and that's even smaller than it is currently for a single process.

-InterpretML

It seems that this problem has not been solved even by now. Am I right? I have a machine of 40 cores and 500G of memory. When running on datasets with 10 million samples and 600 features, EBM is much much slower compared with LightGBM and consumes much more memory.

interpret-ml commented 2 years ago

Hi @tmontana and @ZhangTP1996 -- The latest code in the develop branch finally includes substantial reductions in memory usage. It uses approximately 10x less memory now compared to the previous release. Our next pypi release will include these changes.

-InterpretML team