jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

How to convert spark rdd based gbdt model to pmml model? #38

Closed xixi2715 closed 6 years ago

xixi2715 commented 6 years ago

Jpmml-sparkml can only convert Spark ML pipelines to PMML. But I trained a spark rdd based gbdt mllib model, how can i convert the mllib model to pmml model.

PMML model export - RDD-based API show that only KMeansModel, LinearRegressionModel, RidgeRegressionModel, LassoModel, SVMModel, Binary LogisticRegressionModel can be converted to pmml model. What about gbdt model, Is there no method to convert it to pmml model?

Can anyone help me?

vruusmann commented 6 years ago

As you correctly point out, the JPMML-SkLearn only supports the conversion of Apache Spark ML models and pipelines.

There are no plans to start supporting Apache Spark MLlib. It is my understanding that the Apache Spark team has decided to gradually phase out MLlib functionality, so it would be pointless to spend any resources on it.

I would personally recommend you to re-train your GBDT model using Apache Spark ML. Alternatively, you might want to open discussion with Apache Spark team, and see if they are willing to apply the PMMLExportable trait to the GBDT model type. This trait doesn't seem to cover decision tree model types at the moment, so prospect is not so good.