jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

r2pmml for xgboost: continous response #13

Closed netsche closed 7 years ago

netsche commented 7 years ago

Hi,

I am trying to generate a pmml-File from a xgboost modell with a continous response (binary:logistic) and therefore a lot of response levels. How can I adress this in the response_levels-argument? I tried leaving the argument out or give a list with all the predicted values from my test set. In both cases I get this warning message:

Error in .convert(tempfile, file, ...) : 127

I used a xgboost-model that was trained on a regular data matrix and not on a xgboost dmatrix. Might this be the reason that r2pmml does not work?

Best regards Netsche

vruusmann commented 7 years ago

I am trying to generate a pmml-File from a xgboost model with a continous response (binary:logistic) and therefore a lot of response levels.

The binary:logistic objective function is suitable for two-class classification problems. For example, where the response is something like "no-event" vs. "event".

If you're looking to estimate the probability of the "event" category, then you need to specify the reg:logistic objective function.

I get this warning message: Error in .convert(tempfile, file, ...) : 127

That's only the first line of the warning message. If the problem happens inside the Java side of conversion workflow, then there should be at least 10 more lines printed, plus the full Java exception stacktrace.

I used a xgboost-model that was trained on a regular data matrix and not on a xgboost dmatrix. Might this be the reason that r2pmml does not work?

The representation of the dataset (ie. R matrix vs. XGBoost DMatrix) doesn't affect the xgb.Booster data structure. So, it's very likely not the source of the problem.

Did you get the XGBoost model trained successfully? Can you save it into an RDS file (using the saveRDS function)? If you share this RDS file with me (either attach it to this issue, or send it to my e-mail), then I might try troubleshooting it locally.

One more thing. The xgboost package was updated less than a week ago. What is your xgboost package version?

netsche commented 7 years ago

Hi,

thank you for the quick response!

The binary:logistic objective function is suitable for two-class classification problems. For example, where the response is something like "no-event" vs. "event".

This is what I want to do. The response variable is an integer with two distinct values (0,1). The predicted values range from 0 to 1 (e.g. 0.9876543 or 0.1234567), which is generally what I want, since I want to order the cases into quantiles for example. Sorry, "continous" was probably the wrong description.

That's only the first line of the warning message. If the problem happens inside the Java side of conversion workflow, then there should be at least 10 more lines printed, plus the full Java exception stacktrace.

There are more lines:

Error in .convert(tempfile, file, ...) : 127 In addition: Warning message: running command '"java" -cp "C:/Users/xyz/Documents/R/win-library/3.3/r2pmml/java/guava-19.0.jar;C:/Users/xyz/Documents/R/win-library/3.3/r2pmml/java/istack-commons-runtime-2.21.jar;C:/Users/xyz/Documents/R/win-library/3.3/r2pmml/java/jaxb-core-2.2.11.jar;C:/Users/xyz/Documents/R/win-library/3.3/r2pmml/java/jaxb-runtime-2.2.11.jar;C:/Users/xyz/Documents/R/win-library/3.3/r2pmml/java/jcommander-1.48.jar;C:/Users/xyz/Documents/R/win-library/3.3/r2pmml/java/jpmml-converter-1.2.0.jar;C:/Users/xyz/Documents/R/win-library/3.3/r2pmml/java/jpmml-r-1.2.11.jar;C:/Users/xyzDocuments/R/win-library/3.3/r2pmml/java/jpmml-xgboost-1.1.3.jar;C:/Users/xyz/Documents/R/win-library/3.3/r2pmml/java/pmml-agent-1.3.4.jar;C:/Users/xyz/Documents/R/win-library/3.3/r2pmml/java/pmml-model-1.3.4.jar;C:/Users/xyz/Documents/R/win-library/3.3/r2pmml/java/pmml-model-metro-1.3.4.jar;C:/Users/xyz/Documents/R/win-library/3.3/r2pmml/java/pmml-schema-1.3.4.jar;C:/...

The representation of the dataset (ie. R matrix vs. XGBoost DMatrix) doesn't affect the xgb.Booster data structure. So, it's very likely not the source of the problem.

OK, thank you!

Did you get the XGBoost model trained successfully? Can you save it into an RDS file (using the saveRDS function)? If you share this RDS file with me (either attach it to this issue, or send it to my e-mail), then I might try troubleshooting it locally.

The model works fine in R. I can save it into an rds-file. I tried to convert it in java using this approach:

https://github.com/jpmml/jpmml-r

Unfortunatelly, the java-process calls for a fmap, but there is no way of providing it (I have produced the fmap, but I don't know how to supply it to the process).

One more thing. The xgboost package was updated less than a week ago. What is your xgboost package version?

I am using 0.4-4

vruusmann commented 7 years ago

Error in .convert(tempfile, file, ...) : 127 In addition: Warning message: running command '"java" -cp "..."

You're still showing me only the first line (it's a very long line, and your R terminal wraps it over 10 lines), which does not contain any actionable information.

You should paste the full R command output here. At minimum, it should contain a line that says something like java.io.IOException or java.lang.IllegalArgumentException, plus the following ten lines.

Note to self: looks like I do need to update my GitHub issue template with appropriate instructions.

Unfortunatelly, the java-process calls for a fmap, but there is no way of providing it.

You need to "decorate" the xgb.Booster object with the fmap attribute before saving it into the RDS file:

my.xgb = xgboost(...)
my.xgb$fmap = "/path/to/mydata.fmap" # The path the feature map file
my.xgb$fmap = r2pmml::genFMap(my_df_X) # Alternatively, passing the data.frame representation of the feature map itself (generated using r2pmml::genFMap(X) function)
saveRDS(my.xgb, "myxgb.rds")

Please see the definition of the r2pmml.xgb.Booster function to see how the r2pmml package does it behind the scenes (also, how to set the name of the target field, and target category levels).

I am using 0.4-4

I would recommend you to upgrade to the latest xgboost and r2pmml package versions (0.6-4 and 0.12.0). This upgrade will not likely solve this problem, but it makes it much easier to troubleshoot everything - eliminating all sorts of known side-effects and -issues.