microsoft / SynapseML

Simple and Distributed Machine Learning
http://aka.ms/spark
MIT License
5.06k stars 831 forks source link

raw2probabilityInPlace #440

Closed tanjiaxin closed 5 years ago

tanjiaxin commented 5 years ago

https://github.com/Azure/mmlspark/blob/9ce617b5fd11654ae124018a905f5df1cf50077c/src/lightgbm/src/main/scala/LightGBMClassifier.scala#L91-L101 I have found the probability been calculate by 1 / ( 1 + e^(-2 * x)), normally it should be 1 / ( 1 + e^-x) , could you please tell why the x should be multiplied by 2 ? Thanks for advance.

imatiach-msft commented 5 years ago

@tanjiaxin please see page 8 of Friedman (1999) "Greedy Function Approximation? A Gradient Boosting Machine". They use -2 there for the loss function, and from there they use -2 in the probability estimates. This is also what spark ML uses for their GBTClassifier, please see the reference PR here: https://github.com/apache/spark/pull/16441 I will try to find the loss function formula in lightgbm and validate whether the 2 is needed here. It makes a slight difference in the distribution of the probability values but it shouldn't make a difference in terms of the predicted label, right? As long as X = 0 corresponds to the hyperplane and positive values are on one side vs negative on the other the predicted labels should be the same.

imatiach-msft commented 5 years ago

I think you are right, it looks like lightgbm's GetGradients, which is used in training, doesn't have the -2: https://github.com/Microsoft/LightGBM/blob/c920e6345bcb41fc1ec6ac338f5437034b9f0d38/src/objective/binary_objective.hpp#L102 And the convert output which is used in prediction also doesn't seem to have it: https://github.com/Microsoft/LightGBM/blob/c920e6345bcb41fc1ec6ac338f5437034b9f0d38/src/objective/binary_objective.hpp#L157 I'll update it to remove the -2.

tanjiaxin commented 5 years ago

Thanks for your explaination.