titicaca / spark-iforest

Isolation Forest on Spark
Apache License 2.0
227 stars 89 forks source link

The library doesn't work with Spark 3.0.0 #31

Closed e-compagno closed 3 years ago

e-compagno commented 4 years ago

Apparently the package doesn't work with Spark 3.0.0, as depends on a older version of Hadoop and Spark, as pointed out in https://survival8.blogspot.com/p/isolation-forest-implementation-using.html.

The error I receive (An error occurred while calling None.org.apache.spark.ml.iforest.IForest.) after compiling the jar file and install the python version via pip can be obtained running

from pyspark_iforest.ml.iforest import *
 IForest(contamination=0.3, maxDepth=2)

Is there any plan to update the library so that it works also with spark 3?

titicaca commented 4 years ago

Yes, currently the codes are not available for Spark 3.0, because some apis are changed. I will work on the updates when I find some time in the near future.

e-compagno commented 4 years ago

Anyhow, I have tested that the library works with pyspark 2.4.5 and 2.4.7. May I suggest to relax the pyspark version in the library from pyspark==2.4.0 to pyspark>=2.4.0,<=2.4.7 to avoid forcing a version downgrade?

JJorczik commented 3 years ago

Yes, currently the codes are not available for Spark 3.0, because some apis are changed. I will work on the updates when I find some time in the near future.

Is there any status update for Spark 3.0 support?

titicaca commented 3 years ago

I'm working on it. I just added a new branch https://github.com/titicaca/spark-iforest/tree/spark3 , you can try it with spark 3.0

JJorczik commented 3 years ago

I'm working on it. I just added a new branch https://github.com/titicaca/spark-iforest/tree/spark3 , you can try it with spark 3.0

Seems to work. Thank you!