Closed tmontana closed 2 years ago
same result if n_jobs is set at 48 (stuck at 16 cores)
Hi @tmontana --
If you have that many cores, you can get better models at no extra cost by setting outer_bags to something in the range of 48, which would also get you full utilization. We currently scale by the number of outer bags. Finer grained scaling will require some re-architecting on our part. It's in our backlog, but since most people don't have enough cores to benefit, and considering the complexity of implementation, this is still a ways out.
-InterpretML team
Understood. Thank you,
Hi. Just FYI I am running some experiments on a 96 core machine with 185g in ram. My dataset is 1.2mm rows and 325 columns. I use 40% as validation. If I set outer bags to anything above 35 the kernel will crash with out of memory message. Am I correct in assuming that the data is copied across each core?
If that's the case then just wanted to point out I've had good luck using Ray for parallelism. Ray has shared object storage so does not run into memory issues as you increase the number of cores. Do you think this could be feasible for EBM based on below link?
https://docs.ray.io/en/latest/joblib.html
Thanks
Hi @tmontana -- Yes! You are correct regarding the way data is currently being copied to each core. This is a priority for us to resolve, but has been a bit delayed due to work on other areas. It's next on the list though as several people have had this issue. We were planning to use a RawArray to share this memory across processes. Quite a bit of the package is in C++, so the plan is to allocate the RawArray in python, and then fill it in C++ using a compressed representation. At the end we should then have a single shared memory object which we can pass around to different processes in python, and that's even smaller than it is currently for a single process.
-InterpretML
Hi @tmontana -- Yes! You are correct regarding the way data is currently being copied to each core. This is a priority for us to resolve, but has been a bit delayed due to work on other areas. It's next on the list though as several people have had this issue. We were planning to use a RawArray to share this memory across processes. Quite a bit of the package is in C++, so the plan is to allocate the RawArray in python, and then fill it in C++ using a compressed representation. At the end we should then have a single shared memory object which we can pass around to different processes in python, and that's even smaller than it is currently for a single process.
-InterpretML
It seems that this problem has not been solved even by now. Am I right? I have a machine of 40 cores and 500G of memory. When running on datasets with 10 million samples and 600 features, EBM is much much slower compared with LightGBM and consumes much more memory.
Hi @tmontana and @ZhangTP1996 -- The latest code in the develop branch finally includes substantial reductions in memory usage. It uses approximately 10x less memory now compared to the previous release. Our next pypi release will include these changes.
-InterpretML team
Hi. in my experiments with a 48 core machine EBM only uses 16 cores regardless of the data size. n_jobs is set at -2.
Thanks