Closed yeikel closed 5 years ago
Apache Spark ML is built around the pipeline concept. The JPMML-SparkML library follows this idea, and uses org.apache.spark.ml.Pipeline(Model)
as a conversion unit.
Could you please clarify if this library could be used to transform the GBTClassificationModel to a PMML?
Create a single-step PipelineModel
based on your model object.
@vruusmann Could you please share an example about your suggestion if you have it?
Could you please share an example about your suggestion?
In Java pseudcode:
Model model = GBTClassificationModel.load(...);
PipelineModel pipelineModel = new PipelineModel(new PipelineStage[]{model});
During conversion, the pipeline object is also queried for label and feature information. If this single-step pipeline raises further conversion errors, then you might need to insert (fake-) StringIndexerModel
(for label spec) and VectorAssembler
(feature spec) into it.
@vruusmann Thank you for your help. I am close but I am missing something. Any help would be appreciated,
Exception in thread "main" java.util.NoSuchElementException: Failed to find a default value for inputCol
val ml = GBTClassificationModel.load("....")
val trainingData = spark.read.parquet("...")
val fields = Array(".....")
val assembler = new VectorAssembler().setInputCols(fields).setOutputCol("features")
val sampleSchema = trainingData.select(fields.map(col): _*)
val str = new StringIndexer().setOutputCol("label")
val pipelineEstimator = new Pipeline().setStages(Array(str,assembler, ml)).fit(trainingData)
val pmml = new PMMLBuilder(sampleSchema.schema, pipelineEstimator).build
Hi ,
I am trying to use this library but I get the following exception :
I am using the library like this :
Could you please clarify if this library could be used to transform the
GBTClassificationModel
to aPMML
?