jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

Building master for Spark 3.0 has failing tests #104

Closed PowerToThePeople111 closed 3 years ago

PowerToThePeople111 commented 3 years ago

Hello,

I tried to get a running version of your awesome project for Spark 3.0. As far as I understood, I need to compile the master branch. Sadly tho, when running mvn clean install, some test fail which all seem to have one reason.

I wanted to know if I can safely skip the tests and still get a running version?

[ERROR] Errors:
[ERROR]   ClassificationTest.evaluateDecisionTreeAudit:62->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateDecisionTreeIris:102->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateDecisionTreeSentiment:132->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateGBTAudit:67->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateGLMAudit:72->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateGLMSentiment:137->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateLinearSVCSentiment:142->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateLogisticRegressionAudit:77->IntegrationTest.evaluate:50->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateLogisticRegressionIris:107->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateModelChainAudit:82->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateModelChainIris:112->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateNaiveBayesAudit:87->IntegrationTest.evaluate:50->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateNaiveBayesIris:117->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateNeuralNetworkAudit:92->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateNeuralNetworkIris:122->IntegrationTest.evaluate:50->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateRandomForestAudit:97->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateRandomForestIris:127->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClassificationTest.evaluateRandomForestSentiment:147->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   ClusteringTest.evaluateKMeansIris:31->IntegrationTest.evaluate:46->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   RegressionTest.evaluateDecisionTreeAuto:63->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   RegressionTest.evaluateDecisionTreeHousing:95->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   RegressionTest.evaluateGBTAuto:68->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   RegressionTest.evaluateGLMAuto:73->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   RegressionTest.evaluateGLMFormulaVisit:115->IntegrationTest.evaluate:50->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   RegressionTest.evaluateGLMHousing:100->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   RegressionTest.evaluateLinearRegressionAuto:80->IntegrationTest.evaluate:46->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   RegressionTest.evaluateLinearRegressionHousing:105->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   RegressionTest.evaluateModelChainAuto:85->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   RegressionTest.evaluateRandomForestAuto:90->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[ERROR]   RegressionTest.evaluateRandomForestHousing:110->IntegrationTest.evaluate:42->IntegrationTest.evaluate:64->BatchTest.evaluate:29 » InsecureRecursiveDelete
[INFO]
[ERROR] Tests run: 41, Failures: 0, Errors: 30, Skipped: 0

When using the last version available on the maven repo jpmml-sparkml-1.6.2.jar I get the following stacktrace when executing this code:

import java.io.File
import org.jpmml.sparkml._
import org.apache.spark.ml.{Pipeline, PipelineModel}

val data = spark.read.parquet(...)
val schema = data.drop("class_weight").toDF().schema
val pipelinemodel = PipelineModel.load(...)

val pmml = new PMMLBuilder(schema, pipelinemodel).buildFile(new File("/home/hadoop/age_models/depth20mininst30"))

Stacktrace:

java.lang.NoClassDefFoundError: org/dmg/pmml/Model
  ... 45 elided
Caused by: java.lang.ClassNotFoundException: org.dmg.pmml.Model
  at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  ... 45 more

Do I get this stacktrace because the version is not the correct one or do I have to add an additional dependency?

vruusmann commented 3 years ago

When using the last version available on the maven repo jpmml-sparkml-1.6.2.jar I get the following stacktrace: java.lang.NoClassDefFoundError: org/dmg/pmml/Model

Fix your classpath! The JPMML-SparkML library JAR alone is not enough, there is a bunch of incoming JPMML library dependencies as well.

For example, the org.dmg.pmml.Model class is defined in the JPMML-Model library.

As far as I understood, I need to compile the master branch.

You don't need to compile anything, because you can use the pre-built JPMML-SparkML all-inclusive binary release JAR file as explained in the README file.

Sadly tho, when running mvn clean install, some test fail which all seem to have one reason

The master branch builds 100% fine with Apache Spark version 3.0.X.

This project has full GitHub Actions CI coverage, see here: https://github.com/jpmml/jpmml-sparkml/actions?query=workflow%3Amaven