jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

How to create PMML of Model of custom Spark ML Algorithm? #98

Closed injulkarnilesh closed 4 years ago

injulkarnilesh commented 4 years ago

We need to implement some Clustering Algorithms not implemented by Spark ML library by ourselves. We want the model of these algorithms to be converted to PMML.

Peeking into code of this jpmml-sparkml library, it looks like there is hard mapping of Spark ML Model with it's corresponding converter in sparkml2pmml.properties file.

Can we use this library to convert our Algorithm's Model into PMML? Can we extend the functionality of this library to include custom Algorithm's model? If it is possible, then how?

vruusmann commented 4 years ago

it looks like there is hard mapping of Spark ML Model with it's corresponding converter in sparkml2pmml.properties file.

The JPMML-SparkML library provides two things:

  1. Medium-level API for analyzing Apache Spark ML pipeline steps, and mapping them to PMML data structures (mostly with the help of JPMML-Converter library).
  2. High-level API for converting standard Apache Spark ML pipelines to PMML in a few lines of code (ie. via PMMLBuilder#build()).

If your pipeline constants custom pipeline steps, then you can use the medium-level API to build all the necessary converters (both for transformer and model types) yourself.

Can we use this library to convert our Algorithm's Model into PMML?

Yes, assuming that the final "state" of your algorithm is representable using PMML data structures.

See http://dmg.org/pmml/v4-3/ClusteringModel.html for available vocabulary.

If it is possible, then how?

See the standard K-means converter: https://github.com/jpmml/jpmml-sparkml/blob/master/src/main/java/org/jpmml/sparkml/model/KMeansModelConverter.java