Closed akari0725 closed 6 years ago
This appears like a versioning/shading problem - the method ModelEvaluatorFactory#createModelEvaluator(...)
is trying to identify the model type using instanceof
checks, and your model file does not match any of them. Specifically, decision tree models should match the check if(model instanceof org.dmg.pmml.tree.TreeModel)
, but the Java class of you model object is something different:
https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/ModelEvaluatorFactory.java#L126-L128
If you're mixing different JPMML-Model/JPMML-Evaluator versions, then it may be the case that your model object is actually of type org.dmg.pmml.TreeModel
(note the missing tree
package component). However, if you've performed bad class name shading, then it may be the case that it is of type org.shaded.pmml.tree.TreeModel
(note the added shaded
package component).
To figure out what's going on, simply print the class name of your model object:
System.out.println(model.getClass().getName());
Additionally, when using the org.jpmml:jpmml-evaluator-spark
dependency in your project, then you don't need to manually declare org.jpmml:pmml-model
and/or org.jpmml:pmml-evaluator
dependencies in your pom.xml file. This "parent dependency" will automatically bring in all required "child dependencies".
I have tried do not declare org.jpmml:pmml-model
and org.jpmml:pmml-evaluator
but it can not run at spark local mode
The org.jpmml:jpmml-evaluator-spark 1.1
dependency is I download the source code and mvn clean install
into my local .m2
I try to copy your source code into local java class, and the getName()
result is org.dmg.pmml.tree.TreeModel
(on spark local mode)
I don't know what maven dependency I need to declare, pmml-model 1.3.8
and pmml-evaluator
is the last version on maven.apache.org.
And I don't know why it run succeed on local but can not work at cluster.I put my pmml file into resources folder and load it by ClassLoader, and send the java object(Transformer) to workernodes by sparkSession.sparkContext.broadcast method.
Thank you!
Another question,
I try to run another pmml model that export from sklearn , it worked OK on local mode, but on cluster:
Exception in thread "main" java.lang.IllegalArgumentException: http://www.dmg.org/PMML-4_3
at org.jpmml.schema.Version.forNamespaceURI(Version.java:61)
at org.jpmml.model.PMMLFilter.updateSource(PMMLFilter.java:121)
at org.jpmml.model.PMMLFilter.startPrefixMapping(PMMLFilter.java:43)
at org.apache.xerces.parsers.AbstractSAXParser.startNamespaceMapping(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:357)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:243)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:214)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:140)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:123)
at org.jpmml.model.JAXBUtil.unmarshal(JAXBUtil.java:78)
at org.jpmml.model.JAXBUtil.unmarshalPMML(JAXBUtil.java:64)
at org.jpmml.evaluator.spark.EvaluatorUtil.createEvaluator(EvaluatorUtil.java:55)
at javaCode.mlModel.LRClassifier_s3.getClassifier(LRClassifier_s3.java:21)
I think if this is the same problem on cluster mode.
could you show me a demo that import an KNIME/SKLEARN pmml file and can run on spark cluster(scala) ?
I try to copy your source code into local java class, and the getName() result is org.dmg.pmml.tree.TreeModel (on spark local mode)
Your application is using JPMML-Model 1.3.8 in local mode (in that case the TreeModel element is mapped to the org.dmg.pmml.tree.TreeModel
class), but Apache Spark ML's built-in JPMML-Model 1.2.15 in cluster mode (mapped to the org.dmg.pmml.TreeModel
class).
I have tried do not declare org.jpmml:pmml-model and org.jpmml:pmml-evaluator but it can not run at spark local mode
What's the exception/problem then?
You can solve classpath conflicts by "shading" (ie. renaming and/or relocating) your application classes. See the example here: https://github.com/jpmml/jpmml-sparkml#run-time-conflict-resolution
I would personally suggest you to remove those legacy JPMML-Model library JAR files from your local and cluster environments altogether: https://github.com/jpmml/jpmml-sparkml#modifying-apache-spark-installation
Exception in thread "main" java.lang.IllegalArgumentException: http://www.dmg.org/PMML-4_3 at org.jpmml.schema.Version.forNamespaceURI(Version.java:61)
This is another proof that your cluster "sees" the legacy JPMML-Model version, which supports PMML schema versions 3.0 through 4.2, but not 4.3.
I had modify my pom file:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.jpmml</groupId>
<artifactId>pmml-model</artifactId>
</exclusion>
</exclusions>
</dependency>
and
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/spring.handlers</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.fxc.rpc.impl.member.MemberProvider</mainClass>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/spring.schemas</resource>
</transformer>
</transformers>
<relocations>
<relocation>
<pattern>org.dmg.pmml</pattern>
<shadedPattern>org.shaded.dmg.pmml</shadedPattern>
</relocation>
<relocation>
<pattern>org.jpmml</pattern>
<shadedPattern>org.shaded.jpmml</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
and delete
<!--<!– https://mvnrepository.com/artifact/org.jpmml/pmml-model –>-->
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-model</artifactId>
<version>1.3.8</version>
</dependency>
<!--<!– https://mvnrepository.com/artifact/org.jpmml/pmml-evaluator –>-->
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-evaluator</artifactId>
<version>1.3.10</version>
</dependency>
this time it run successful on cluster mode!
but did not delete this two files on cluster:
$SPARK_HOME/jars/pmml-model-1.2.15.jar
$SPARK_HOME/jars/pmml-schema-1.2.15.jar
but, whether I add this two dependency or not, it both can not run at local mode:
<!--<!– https://mvnrepository.com/artifact/org.jpmml/pmml-model –>-->
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-model</artifactId>
<version>1.3.8</version>
</dependency>
<!--<!– https://mvnrepository.com/artifact/org.jpmml/pmml-evaluator –>-->
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-evaluator</artifactId>
<version>1.3.10</version>
</dependency>
error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/ml/Transformer
at javaCode.mlModel.BaixiaoeModel.getClassifier
(may this is a proof that I don't need to add the two dependency)
I try to remove two .jar from local $SPARK_HOME and it can not work too.
Could you tell how can I run this program successful at both local and cluster?
Thanks for your answer and patience my bad English!
Hello!
My colleague give me an pmml file like:
I build a transformer:
and run it at spark local, it's no any problem, the job add the predicte tag after the DataFrame.
But, when i run it at AWS EMR cluster which has 1 masternode and 2 workernode, The java code can not transform the pmml file:
I don't know Y.
my env is: java 7 scala 2.11.8 spark 2.0.2 (AWS EMR 5.2.1)
Last time, in order to run the pmml(4.3) exported from sklearn, I used jpmml-evaluator-spark 1.1 This time, the version of pmml is 4.2, but when I use jpmml-evaluator-spark 1.0.0, it has same problam.
Forgive my fucking English...
Thank you!