Closed wxchan closed 8 years ago
Thanks, this is a bug. I will fix it soon.
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.
Seems the data in svmlight format leads to a different model from the same data in csv/tsv format, because it ignores the value=0 terms.
Use data in LightGBM/examples/regression for example. I write a python notebook to compare between different data formats under same configurations.
The outputs of svmlight format data(step 2 & 3) are the same, and they are different from the output of tsv format data(step 1); if I keep 0-terms in svmlight format data(step 4), it shows the same result with tsv format data(step 1).
Also, I tried to add
if (fabs(val)>1e-10)
beforeout_features->emplace_back(idx, val);
in class CSVParser/TSVParser (LightGBM/src/io/parser.hpp line 23&52). It shows similar performances to svmlight format data.