microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.64k stars 3.83k forks source link

[RFC] [python-package] remove h2o `datatable` support? #6662

Open jameslamb opened 2 weeks ago

jameslamb commented 2 weeks ago

Summary

Support for the h2o's datatable library was added to LightGBM 5.5+ years ago, in #1970.

Proposing here that lightgbm:

Motivation

That project seems to be abandoned:

In those 5.5 years since #1970, the only bug reports / feature requests received about datatable support have been from one person working for h2o... and the last of those was 4 years ago:

And in all that time, I don't think we have ever tested against datatable in CI.

Description

Doing this would simplify the Python package, making it easier for others to contribute.

It'd also make it more manageable to add support for newer, more popular input formats like polars (#6204).

See @trivialfis's summary of the current state of supporting data frame libraries at https://github.com/dmlc/xgboost/issues/10554#issuecomment-2211824457 ... I agree with it.

References

I am not proposing here that lightgbm should support H2OFrame... Dask doesn't, XGBoost doesn't, scikit-learn doesn't... and I think our limited time and attention here would be better spent on more widely-used input formats, like polars.

jameslamb commented 2 weeks ago

@guolinke @shiyu1994 @StrikerRUS @jmoralez @borchero @btrotta please let me know what you think whenever you have time

StrikerRUS commented 2 weeks ago

I'm +1 for dropping support of datatable. Especially given that so called "support" is simple .to_numpy() method call 🙃

trivialfis commented 2 weeks ago

Thank you for the ping. Sounds good to me considering that there's no new commit to the project now.

jmoralez commented 2 weeks ago

I'm +1 as well

guolinke commented 2 weeks ago

I am +1

borchero commented 2 weeks ago

I'm in favor of removing as well ✅

jameslamb commented 2 weeks ago

Thank you all for the quick responses! I'll put up a PR adding a deprecation warning.