linkedin / isolation-forest

A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm with support for exporting in ONNX format.
Other
223 stars 47 forks source link

InvalidClassExcepiton #5

Closed ikoiko closed 4 years ago

ikoiko commented 4 years ago

Hello, I used your build configuration and successfully built jar file(isolation-forest_2.11-0.3.1) by using gradlew. However when i use newly generated jar on my project it gives me an error while fitting data. Error Detail : "Caused by: java.io.InvalidClassException: com.linkedin.relevance.isolationforest.IsolationForest; local class incompatible: stream classdesc serialVersionUID = 5883725353499012901, local class serialVersionUID = 6413710209040362293"

I built source code on my

My gradle.build file is following : plugins { // Apply the scala plugin to add support for Scala id 'scala' }

dependencies { compile("com.chuusai:shapeless_2.11:2.3.2") // compile("com.databricks:spark-avro_2.11:4.0.0") compile("org.apache.spark:spark-avro_2.11:2.4.0") compile("org.apache.spark:spark-core_2.11:2.4.0") compile("org.apache.spark:spark-mllib_2.11:2.4.0") compile("org.apache.spark:spark-sql_2.11:2.4.0") compile("org.scalatest:scalatest_2.11:2.2.6") compile("org.testng:testng:6.8.8") }

test { useTestNG() }

archivesBaseName = "${project.name}_2.11"

Can you please help me about solving this issue.

P.S i followed exactly same steps to build release v0.2.2(isolation-forest_2.11-0.2.3) it works perfectly. Only v0.3.0(isolation-forest_2.11-0.3.1) has above problem

Thanks in advance

jverbus commented 4 years ago

Thanks, @ikoiko! I'll take a look.

jverbus commented 4 years ago

It looks like Spark 2.4.4 may not support Scala 2.11.

"For the Scala API, Spark 2.4.4 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x)."

https://spark.apache.org/docs/latest/

Are you able to use an earlier version of Spark? The library has been most extensively tested with Scala 2.11.8 and Spark 2.3.0.

jverbus commented 4 years ago

Alternatively, you should try bumping all of the dependency versions to 2.4.4 for compatibility with the Spark version you're using on your cluster.

plugins {
    // Apply the scala plugin to add support for Scala
    id 'scala'
}

dependencies {
    compile("com.chuusai:shapeless_2.11:2.3.2")
// compile("com.databricks:spark-avro_2.11:4.0.0")
    compile("org.apache.spark:spark-avro_2.11:2.4.4")
    compile("org.apache.spark:spark-core_2.11:2.4.4")
    compile("org.apache.spark:spark-mllib_2.11:2.4.4")
    compile("org.apache.spark:spark-sql_2.11:2.4.4")
    compile("org.scalatest:scalatest_2.11:2.2.6")
    compile("org.testng:testng:6.8.8")
}

test {
    useTestNG()
}

archivesBaseName = "${project.name}_2.11"
ikoiko commented 4 years ago

Hi @jverbus Thanks for reply. As soon as i remember i already did dependency bumping as you already mentioned above; however it didn't affect the situation. But i am not 100% sure about it. I will definitely give it a try at tomorrow. On the other hand, i don't have any chance to change my current spark/scala version which are running on production. (We are still using 2.4.0 and scala 2.11.11). Because of our production servers are isolated from internet, i am using virtual Linux as a test and build environment. I will also try changing my linux spark version to 2.4.0 to build libraries clearly.

thanks

jverbus commented 4 years ago

@ikoiko : Cool, please let me know if works.

jverbus commented 4 years ago

@ikoiko: Any success?

ikoiko commented 4 years ago

Hi @jverbus Sorry for late reply. We have just go into an heavy working period so i couldn't reply you back. I configured my virtual environment and use spark 2.4.0 , scala 2.11.11(same as my produciton env) and set build.gradle file to use spark 2.4.0 dependencies but i have failed with same error. I am planning using spark 2.4.4 and scala 2.12.x version to build again. I will let you know whether it works or not.

ikoiko commented 4 years ago

Here is my version matrix and results for now :

Spark | Scala | Build.Gradle Dependencies | RESULT 2.4.4 | 2.11.11 | 2.4.4 | FAIL 2.4.4 | 2.11.11 | 2.4.0 | FAIL 2.4.0 | 2.11.11 | 2.4.0 | FAIL 2.4.4 | 2.12.x | 2.4.4 | FAIL

ikoiko commented 4 years ago

Hi @jverbus

I have tried with scala 2.12.0 but no luck.

jverbus commented 4 years ago

Hi @ikoiko ,

I spun up an Azure Spark cluster, but wasn't able to reproduce your reported issue.

I tried with Spark 2.4.0 and Scala 2.11.12 on Ubuntu 16.04. I set the build.gradle dependencies to 2.4.0.

I was able to build the jar and use it on the cluster to fit an isolation forest to both the shuttle.csv and mammography.csv datasets that are included in the git repo.

Are you able to try on a different cluster?

ikoiko commented 4 years ago

Hi @jverbus

Unfortunately can't. We don't have any other cluster.

jverbus commented 4 years ago

I'm going to close this as I'm not able to reproduce the issue.