jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

Transformer class org.apache.spark.ml.feature.VectorAssembler is not supported #87

Closed Mantj closed 4 years ago

Mantj commented 4 years ago

just like #45 , I ran programs in spark-local sucessfully. But when I ran codes in spark-yarn online, the following error message occurred:

Exception in thread "main" java.lang.IllegalArgumentException: Transformer class org.apache.spark.ml.feature.VectorAssembler is not supported
    at org.jpmml.sparkml.ConverterFactory.newConverter(ConverterFactory.java:58)
    at org.jpmml.sparkml.PMMLBuilder.build(PMMLBuilder.java:105)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

The error occurred at this line: val pmml = new PMMLBuilder(dataTrain.data.schema, pipelineModel).build() The pipelineModel has VectorAssembler and GBT

But unlike #45 ,I used spark 2.3.4 and scala 2.11.12, and the dependency I had is:

  1. jpmml-sparkml-1.4.11
  2. jpmml-converter-1.3.10

Please help~~~

vruusmann commented 4 years ago

What is the functional difference between your spark-local and spark-yarn environments? The JPMML-SparkML library does not care about its environment. Therefore, the error is totally caused by your Spark application.

The resolution is exactly the same as for #45 - fix your build. Specifically, carefully review your shading configuration, and make sure that the META-INF/sparkml2pmml.properties configuration files is correctly included.