PyPMML is hiding secondary result fields

jtzhang17 commented 2 years ago

I have a PySpark XGBoost pipelineModel, and it was saved as PMML in the following way:

pipelineModel = Pipeline(stages=pipeline_stages).fit(df)
pmml_builder = PMMLBuilder(sc, df, pipelineModel)
pmml_builder.buildFile("trained_xgb_model.pmml")

The saved PMML model was loaded using the pypmml-spark package, and a testing data set was applied to the loaded model. However, the final results always contain one prediction column, but never include the probability or rawPrediction columns.

from pypmml_spark import ScoreModel

model = ScoreModel.fromFile(model_name)
df_pred = model.transform(df_test)
df_pred.show(5)

Can someone share me an example that the saved model from pyspark2pmml can produce the probability column in the model evaluation results?

vruusmann commented 2 years ago

The saved PMML model was loaded using the pypmml-spark package

The PyPMML library suppresses secondary result fields by default. I have zero control over this behaviour.

Can someone share me an example that the saved model from pyspark2pmml can produce the probability column in the model evaluation results?

Please take your issue to the PyPMML project. It does not belong to here.

jtzhang17 commented 2 years ago

Could you please explain a little bit more about PyPMML library suppresses secondary result fields by default? You mean the probability column is a secondary result? What does secondary result mean? Thanks!

The saved PMML model was loaded using the pypmml-spark package

The PyPMML library suppresses secondary result fields by default. I have zero control over this behaviour.

jpmml / pyspark2pmml

PyPMML is hiding secondary result fields #37