jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

Passing missing values for continuous predictors #27

Closed mcharles closed 6 years ago

mcharles commented 6 years ago

Apologies for not having any reproducible examples on hand - but wondering if there is any quick answer or point in the right direction for this particular issue:

I have constructed a GBDT model in R using the dismo package, and then used the latest r2pmml package to generate a PMML file. When we try to run the PMML (v 4.3) file in Syncfusion, we are getting errors when it encounters a missing/null value for one of our continuous predictors. We've tried several different ways to pass this missing value to no avail.

Our working assumption is that this is a PMML / Syncfusion issue that we need to solve, given that the GBDT algorithm handles missing values from continuous variables just fine. But anyone know if we are off track here?

vruusmann commented 6 years ago

Apologies for not having any reproducible examples on hand

Very difficult to answer your question without seeing the R code. I assume that your input data.frame contains missing values, and that they are passed to the gbm::gbm function (via some sort of dismo wrapper) as-is.

The PMML document that is generated for the gbm::gbm model type contains three-way splits. The first split is a SimplePredicate element, which checks if the value is missing using the isMissing PMML built-in function. Hence, the PMML file is ready to handle missing values.

When we try to run the PMML (v 4.3) file in Syncfusion, we are getting errors when it encounters a missing/null value for one of our continuous predictors.

Can you score the PMML model with sample data using the org.jpmml.evaluator.EvaluationExample command-line application from the JPMML-Evaluator project? See https://github.com/jpmml/jpmml-evaluator#example-applications

I have a reason to think that JPMML-Evaluator will score your PMML file just fine, and your problem is related to Syncfusion. However, if the JPMML-Evaluator also fails (or gives bad predictions), then please let me know about it.