microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.54k stars 3.82k forks source link

[Question] Is Microsoft still supporting this project? #6128

Closed julioasotodv closed 10 months ago

julioasotodv commented 11 months ago

Hi all!

I know for a fact that @jameslamb is doing an amazing job maintaining this project.

With that said, I am starting to get a bit concerned about how Microsoft is supporting this project in terms of (wo)man hours and overall effort. Given that this project lives under the microsoft GH organization, I would assume that Microsoft as a company allocates resources in order to improve and maintain LightGBM. But let me tell you that as an outsider, it definitely does not look like this is the case.

With XGBoost 2.0 a great amount of features that used to be LGBM's "special sauce" are now also available in the aforementioned library (for me personally the only feature missing in XGBoost is the linear_tree feature).

This may look like a rant, but I would really love to have a Microsoft representative's take on this topic; and perhaps a bit on transparency from their side on the plans to keep LGBM with the same quality standards and interesting innovations than it used to have.

Thank you all!

jameslamb commented 11 months ago

Thanks for your interest in LightGBM.

@shiyu1994 is our representative from Microsoft, and I'll let him speak to how Microsoft is supporting the project and the level of support it plans to offer in the future.

mayer79 commented 11 months ago

Not speaking for Microsoft, but: To me, the main advantage of LightGBM remains its increadible speed. XGBoost has certainly catched up lately, but LGB still appears to be way faster. In the future, I'd love to work a bit on LGB's R interface.

julioasotodv commented 11 months ago

@mayer79 is it? Maybe for CPU it is still slightly faster than XGB if compiled aggressively, but GPU implementation for XGB (or CatBoost for that matter) is objectively better

shiyu1994 commented 11 months ago

@julioasotodv Thanks for your interest in LightGBM. The short answer for your question: yes, we are definitely supporting LightGBM. Here are some of our updates from our team within Microsoft in the past year:

These may not seem to be a lot of new features. But we are making our algorithm lighter and faster with non-trivial technical innovations. And we are working further to make these features more stable and out-of-the-box. For example, we have a plan to include the CUDA builds in the released packages so that no manual installation is needed to run LightGBM with GPUs, and our multi-GPU support is on the way to this main repo.

For the latest benchmark of training efficiency in CPUs and GPUs, you may refer to the tables in our paper https://arxiv.org/abs/2207.09682. With our new GPU version and together with quantized training, we see an overall training speedup up to 3x compared with the faster one of XGB and CatBoost.

LightGBM is a precious open-source project, and we will keep the effort on it to maintain its excellence. And if you have any suggestions, please feel free to post here or contact us. Thanks again!

adfea9c0 commented 11 months ago

@mayer79 is it? Maybe for CPU it is still slightly faster than XGB if compiled aggressively, but GPU implementation for XGB (or CatBoost for that matter) is objectively better

@julioasotodv When did you test this? When I compared XGBoost/LightGBM on GPU recently I found LightGBM noticably faster.

onacrame commented 11 months ago

I’ve traditionally used lightgbm models in a corporate setting but have to say some of the features in other GBDT libraries do appeal. I think the following features would be great to have in future versions of lightgbm (and are already implemented in other GBDT libraries).

-multioutput aggression, different targets but also multiple outputs from the same tree structure (for example a multiclass classification model) as per SketchBoost as well as XGBoost which has this feature in beta.

-uncertainty estimation (absolutely critical in finance and medical fields). Yes there is quantile regression but quantile regression is not particular efficient. See for example what CatBoost has done here but even better would be an approach based on conformal prediction which provides validity guarantees.

-out of core training (see recent XGBoost release)

-faster inference. See what packages like treelite and lleaves have done to increase inference speed by orders of magnitude

Overall love the package but would like to see more resource put behind it. LLMs are all the rage these days but the overwhelming majority of the world’s data is still tabular data and strong support is needed for tabular state of the art models.

jameslamb commented 10 months ago

It's been a month with no new posts here, so I'm going to close this.

For discussion of which feature requests you'd like to see prioritized, please comment on #2302.

If you want to request a new feature that you don't yet see in #2302, add a new feature request at https://github.com/microsoft/LightGBM/issues/new/choose.

would like to see more resource put behind it

This is an open source project and we welcome contributions from anyone. Please do consider contributing code and documentation to the project. If you're unsure where to start, browse the open issues at https://github.com/microsoft/LightGBM/issues.