microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.73k stars 3.84k forks source link

svmlight format leads to different models, because it ignores 0-value terms? #17

Closed wxchan closed 8 years ago

wxchan commented 8 years ago

Seems the data in svmlight format leads to a different model from the same data in csv/tsv format, because it ignores the value=0 terms.

Use data in LightGBM/examples/regression for example. I write a python notebook to compare between different data formats under same configurations.

The outputs of svmlight format data(step 2 & 3) are the same, and they are different from the output of tsv format data(step 1); if I keep 0-terms in svmlight format data(step 4), it shows the same result with tsv format data(step 1).

Also, I tried to add if (fabs(val)>1e-10) before out_features->emplace_back(idx, val); in class CSVParser/TSVParser (LightGBM/src/io/parser.hpp line 23&52). It shows similar performances to svmlight format data.

guolinke commented 8 years ago

Thanks, this is a bug. I will fix it soon.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.