microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.67k stars 3.83k forks source link

Minimal Variance Sampling in Stochastic Gradient Boosting #2644

Closed StrikerRUS closed 3 months ago

StrikerRUS commented 4 years ago

Summary

MVS can be considered as an improved version of the Gradient-based One-Side Sampling (GOSS, see details in the paper) implemented in LightGBM, which samples a given number of top examples by values |gi| with the probability 1 and samples other examples with the same fixed probability. Due to the theoretical basis, MVS provides a lower variance of the estimate Eg than GOSS.

References

Docs:

StrikerRUS commented 4 years ago

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

guolinke commented 4 years ago

refer to https://github.com/ibr11/LightGBM

StrikerRUS commented 4 years ago

@guolinke Exciting! However, the author disabled possibility to open issues in that repo and I can't find their contact info in the GitHub profile to ask for creating a PR.

What do you think about borrowing that code with referring to the author. I guess if the author wished to contribute to the upstream repo, it was enough time to do it from the latest commits...

StrikerRUS commented 4 years ago

Uhhh, just noticed that this is one of the authors of MVS!

StrikerRUS commented 3 months ago

Closing according to https://github.com/microsoft/LightGBM/pull/5091#issuecomment-2157106007.

Welcome to contribute this feature! Feel free to fork https://github.com/microsoft/LightGBM/tree/mvs_dev branch.