microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.53k stars 3.82k forks source link

Potential problem with thresholds for bins around 0 #6589

Open jaguerrerod opened 1 month ago

jaguerrerod commented 1 month ago

I have a dataset with features with values -4, -3, -2, -1, 0, 1, 2, 3, 4 When I check the trees fitted I see cut points for bins -3.5, -2.5, -1.5, 1.5, 2.5, 3.5 all ok But around 0 I see 1.0000000180025095e-35 and -1.0000000180025095e-35 Why not 0.5 and -0.5? These extremely small values could be considered both 0 by the machine, and feature <= -1.0000000180025095e-35 could include 0 so both -1.0000000180025095e-35 and 1.0000000180025095e-35 would be the same cut point. Why this election of cut points for bins? image

jameslamb commented 1 month ago

@shiyu1994 can you please answer this?