tidymodels / tidypredict

Run predictions inside the database
https://tidypredict.tidymodels.org
Other
258 stars 31 forks source link

consider that xgboost converts data to 32 bit float internally #45

Open ras44 opened 5 years ago

ras44 commented 5 years ago

Hi @edgararuiz, I'm the author of the article mentioned by @JiaxiangBU in this issue. I wanted to mention the following in case if helps with development:

I discovered after writing the article was that xgboost converts data internally to 32-bit floats, and the resulting coefficients in the xgb.dump JSON correspond to this treatment. This might lead to errors, particularly with logistic regression objective functions. See the discussions at: https://github.com/dmlc/xgboost/issues/4097

In other words, applying the coefficients as-is would assume the data in the database is represented as 32-bit floats. A good test would be to run xgboost predictions using the model binary (loaded via xgb.load) and then compare them with the tidypredict SQL results produced on the same data stored in a database, particularly for logistic regression objective functions.

I hope this is helpful!

edgararuiz-zz commented 5 years ago

Ok, thank you for the heads up, I'll look into that.