Specifying missingValueStrategy for python xgboost

nedwebster commented 3 years ago

Hi,

I am trying to convert my python xgboost model to pmml, but the software calling the pmml file cannot accept 'lastPrediction' as the missingValueStrategy. Is there a way to specify the missingValueStrategy (as well as the noTrueChildStrategy) when building the pmml pipeline?

Many thanks.

vruusmann commented 3 years ago

The JPMML-XGBoost library is converting XGBoost models to PMML models in a way that preserves the original decisioning logic 100%.

One of the key features of XGBoost is the ability to deal with missing values natively. So, all JPMML-XGBoost generated PMML files will also be "missing-value aware", which means that the scoring will go on until a leaf node is reached; it's not permitted to stop at some arbitrary point, and bail out with an interim value.

XGBoost models can be represented in two ways - original/non-compacted and compacted. Use the org.jpmml.xgboost.HasXGBoostOptions#OPTION_COMPACT conversion option to choose between the two.

In the SkLearn2PMML package you can do so using the sklearn2pmml.pipeline.PMMLPipeline.configure(**pmml_options) method:

pipeline = PMMLPipeline([
  ("classifier", XGBClassifier(...))
])
pipeline.fit(X, y)

# Compacted
pipeline.configure(compact = True)
sklearn2pmml(pipeline, "xgboost-compact.pmml")

# Non-compacted
pipeline.configure(compact = False)
sklearn2pmml(pipeline, "xgboost-non_compact.pmml")

nedwebster commented 3 years ago

Hi vruusmann,

Thank you for your speedy reply, your comment was extremely useful.

jpmml / jpmml-xgboost

Specifying missingValueStrategy for python xgboost #56