Efficiency is much worse when data_sample_strategy is set as GOSS

microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

https://lightgbm.readthedocs.io/en/latest/

MIT License

16.73k stars 3.84k forks source link

Efficiency is much worse when data_sample_strategy is set as GOSS #5830

Closed AllenSun1024 closed 1 year ago

AllenSun1024 commented 1 year ago

Description

With the same parameters except data_sample_strategy, time elapsed for training with parameter data_sample_strategy = bagging is 194s while that with parameter data_sample_strategy = goss is 1754s, which is 10x slower approximately.

As stated in paper <>, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. However, it doesn't perform well in reality.

Why?

guolinke commented 1 year ago

there are several possible reasons.

the speed-up is data-dependent, it is not possible to have the same speed-up for all kinds datasets.
Parameters, like early stopping may result in very different speeds.
Versions. There are significant changes after the paper, some changes may affect the speed. you can try the early version of LightGBM.

AllenSun1024 commented 1 year ago

there are several possible reasons.

the speed-up is data-dependent, it is not possible to have the same speed-up for all kinds datasets.

Parameters, like early stopping may result in very different speeds.

Versions. There are significant changes after the paper, some changes may affect the speed. you can try the early version of LightGBM.

Got it, thank you anyway.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.