microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.61k stars 3.83k forks source link

[Question] Using shared memory across boosters #4110

Closed memeplex closed 3 years ago

memeplex commented 3 years ago

I need to do predictions from multiple processes (this is the architecture I've to deal with, not the one I designed) and I don't want to load the same boosters (there are hundreds of them) into the memory of each process. Now, one option is to create a prediction service but, if that's possible, I would prefer to share the memory used by the boosters to store tree ensembles, although not necessarily other structures that might not be accessible in a thread-safe manner. Are there any options to specify the memory pool where boosters are stored? Thanks!

memeplex commented 3 years ago

A clarification: this is only to do predictions, not training at all, just the model evaluation part.

alexisdrakopoulos commented 3 years ago

Along these lines I wanted to ask, but don't believe it constitutes its own ticket, can you share memory of a Dataset across different boosters who are trained with the same inputs but differing labels?

shiyu1994 commented 3 years ago

@memeplex In C++ interface of LightGBM, we can convert LightGBM models into C++ code with if-else statements. After such converting, I think the size of the if-else code should be smaller than the original booster ensemble objects in memory. This may not fit your request that different processes share the memory to store the booster, since each process has to load the code independently. However, when your memory to store the boosters is limited, converting to C++ code may be a good solution for lower memory cost.

shiyu1994 commented 3 years ago

@alexisdrakopoulos Sure. in Python API, you can pass the same data to the constructor of multiple Dataset's (lgb.Dataset(data=data, label=...)), and change the labels passed to these constructors only. In that way, the memory of the data part will be shared.

no-response[bot] commented 3 years ago

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.