Hi @edgararuiz, I'm the author of the article mentioned by @JiaxiangBU in this issue. I wanted to mention the following in case if helps with development:
I discovered after writing the article was that xgboost converts data internally to 32-bit floats, and the resulting coefficients in the xgb.dump JSON correspond to this treatment. This might lead to errors, particularly with logistic regression objective functions. See the discussions at: https://github.com/dmlc/xgboost/issues/4097
In other words, applying the coefficients as-is would assume the data in the database is represented as 32-bit floats. A good test would be to run xgboost predictions using the model binary (loaded via xgb.load) and then compare them with the tidypredict SQL results produced on the same data stored in a database, particularly for logistic regression objective functions.
Hi @edgararuiz, I'm the author of the article mentioned by @JiaxiangBU in this issue. I wanted to mention the following in case if helps with development:
I discovered after writing the article was that xgboost converts data internally to 32-bit floats, and the resulting coefficients in the
xgb.dump
JSON correspond to this treatment. This might lead to errors, particularly with logistic regression objective functions. See the discussions at: https://github.com/dmlc/xgboost/issues/4097In other words, applying the coefficients as-is would assume the data in the database is represented as 32-bit floats. A good test would be to run xgboost predictions using the model binary (loaded via xgb.load) and then compare them with the tidypredict SQL results produced on the same data stored in a database, particularly for logistic regression objective functions.
I hope this is helpful!