jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

Upgrade to Apache Spark 3.0(.X) #80

Closed wzymmzs closed 4 years ago

wzymmzs commented 5 years ago

I'm trying to useConverterUtil.toPMML in spark-ml exampleJavaDecisionTreeClassificationExample, but I get a java.lang.IllegalArgumentException error.

I find there is org.apache.spark.ml.feature.OneHotEncoder in sparkml2pmml.properties. But when the code running at

if(clazz == null || !(Transformer.class).isAssignableFrom(clazz)){
                throw new IllegalArgumentException("Expected " + Transformer.class.getName() + " subclass, got " + (clazz != null ? clazz.getName() : null));
            }

errors happened:

Caused by: java.lang.IllegalArgumentException: Expected org.apache.spark.ml.Transformer subclass, got org.apache.spark.ml.feature.OneHotEncoder
    at org.jpmml.sparkml.ConverterUtil.putConverterClazz(ConverterUtil.java:187)
    at org.jpmml.sparkml.ConverterUtil.init(ConverterUtil.java:325)
    at org.jpmml.sparkml.ConverterUtil.init(ConverterUtil.java:285)
    at org.jpmml.sparkml.ConverterUtil.<clinit>(ConverterUtil.java:334)
    ... 2 more

I think ConverterUtil allows PipelineStages extends Transformer, but OneHotEncoder extends Estimator still in the properties. Is this the cause of the error?

wzymmzs commented 5 years ago

Spark version is 3.0.0-SNAPSHOT. jpmml-sparkml maven version is 1.5.4.

vruusmann commented 5 years ago

I'm trying to use ConverterUtil.toPMML in spark-ml example

Class org.jpmml.sparkml.ConverterUtil has been thoroughly deprecated, and you shouldn't be using any of its utility methods in your application code. Please switch to the org.jpmml.sparkml.PMMLBuilder component instead.

Spark version is 3.0.0-SNAPSHOT jpmml-sparkml maven version is 1.5.4

The JPMML-SparkML library hasn't been ported to the Apache Spark 3.0 version yet.

I'm waiting for the official 3.0.0 release to happen.

I think ConverterUtil allows PipelineStages extends Transformer, but OneHotEncoder extends Estimator still in the properties. Is this the cause of the error?

Correct.

wzymmzs commented 5 years ago

But I still don't understand why Estimator cannot be parsed to a PMML modle?

vruusmann commented 5 years ago

Reopening, because this issue is not solved yet!

But I still don't understand why Estimator cannot be parsed to a PMML modle?

The JPMML-SparkML library expects Apache Spark class hierarchy to have a specific structure/layout when mapping transformer/model classes to converter classes. Looks like the class hierarchy has been seriously reorganized between Apache Spark 2.4 and 3.0 versions.