jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

spark ml decision tree model convert to pmml #93

Closed XScarlett closed 4 years ago

XScarlett commented 4 years ago

when convert spark ml pipelinemodel to pmml, i want to set missingValueStrategy as lastPrediction and set ScoreDistribution and score in every node(not just leaf node), how can i do this in java?

The following picture is my code and part of pmml.xml result: image

image

vruusmann commented 4 years ago

Right now you're invoking PMMLBuilder#buildFile(...) which saves the PMML class model object into a file in the local filesystem.

If you invoke PMMLBuilder#build(), then you'll obtain a live org.dmg.pmml.PMML object instance that you can modify as you see fit. I'd recommend using the Visitor API of the JPMML-Model library for implementing all the necessary transformations and rearrangements.

For example, changing the TreeModel@missingValueStrategy attribute value:

PMMLBuilder pmmlBuilder = ...
PMML pmml = pmmlBuilder.build();

Visitor mvsCustomizer = new AbstractVisitor(){
    @Override
    public VisitorAction visit(TreeModel treeModel){
      treeModel.setMissingValueStrategy(TreeModel.MissingValueStrategy.LAST_PREDICTION);
      return super.visit(treeModel);
    }
};
mvsCustomizer.applyTo(pmml);
vruusmann commented 4 years ago

It's possible to compute record counts for "parent" tree levels by summing the record counts of their "child" tree levels.

There's a Visitor API example available in another demo project: https://github.com/vruusmann/rf_feature_impact/blob/master/src/main/java/feature_impact/visitors/ScoreDistributionGenerator.java

vruusmann commented 4 years ago

Leaving this issue open-ish - a reminder that perhaps there's a way to generalize and implement all this functionality in the form of JPMML-SparkML conversion options.

XScarlett commented 4 years ago

Thank you so much!!!