titicaca / spark-iforest

Isolation Forest on Spark
Apache License 2.0
227 stars 89 forks source link

JavaPackage is not callable - pays-ark #14

Closed yaakovML closed 5 years ago

yaakovML commented 5 years ago

hi,

I followed your guideline and got:

iforest = IForest(contamination=0.3, maxDepth=2) Traceback (most recent call last): File "/Users/htayeb/miniconda2/envs/cs/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2878, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in iforest = IForest(contamination=0.3, maxDepth=2) File "/Users/htayeb/miniconda2/envs/cs/lib/python2.7/site-packages/pyspark/init.py", line 110, in wrapper return func(self, *kwargs) File "/Users/htayeb/miniconda2/envs/cs/lib/python2.7/site-packages/pyspark_iforest/ml/iforest.py", line 245, in init self._java_obj = self._new_java_obj("org.apache.spark.ml.iforest.IForest", self.uid) File "/Users/htayeb/miniconda2/envs/cs/lib/python2.7/site-packages/pyspark/ml/wrapper.py", line 67, in _new_java_obj return java_obj(java_args) TypeError: 'JavaPackage' object is not callable

any idea why?

I am using python 2.7 and the spark context is alive.

titicaca commented 5 years ago

It hasn't been tested on python 2. Try with python 3.6.x.

yaakovML commented 5 years ago

Exactly what I am doing now.

Will let you know. On 27 Jun 2019, 16:11 +0300, Yang, Fangzhou notifications@github.com, wrote:

It hasn't been tested on python 2. Try with python 3.6.x. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

yaakovML commented 5 years ago

Same problem. it shows the same problem for both python 3.7 and 2.7

any idea why?

On 27 Jun 2019, 16:11 +0300, Yang, Fangzhou notifications@github.com, wrote:

It hasn't been tested on python 2. Try with python 3.6.x. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

titicaca commented 5 years ago

Not sure. Does it work for you with python 3.6?

I think some pyspark api is not compatible with python 2.7. I will look into this problem later.

yaakovML commented 5 years ago

I also tried with python 3, it didn't work. On 28 Jun 2019, 10:51 +0300, Yang, Fangzhou notifications@github.com, wrote:

Not sure. Does it work for you with python 3.6? I think some pyspark api is not compatible with python 2.7. I will look into this problem later. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

titicaca commented 5 years ago

Does your pyspark work? Check if you have installed and setup your pyspark correctly.

yaakovML commented 5 years ago

Pyspark and vectors work. Thanks for your help! On 28 Jun 2019, 11:35 +0300, Yang, Fangzhou notifications@github.com, wrote:

Does your pyspark work? Check if you have installed and setup your pyspark correctly. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

titicaca commented 5 years ago

Could you try kmeans example in pyspark to see if it works

yaakovML commented 5 years ago

Hi and again, thanks so much for your support.

I ran the following code:

from pyspark.conf import SparkConf from pyspark.sql import SparkSession from pyspark.ml.linalg import Vectors

spark = SparkSession \ .builder.master("local[*]") \ .appName("IForestExample") \ .getOrCreate()

data2 = [(Vectors.dense([0.0, 0.0]),), (Vectors.dense([7.0, 9.0]),),(Vectors.dense([9.0, 8.0]),), (Vectors.dense([8.0, 9.0]),)] df = spark.createDataFrame(data2, ["features"]) df.show()

from pyspark.ml.clustering import KMeans from pyspark.ml.evaluation import ClusteringEvaluator

kmeans = KMeans().setK(2) model = kmeans.fit(df) predictions = model.transform(df) predictions.show()

from pyspark_iforest.ml.iforest import *

iforest = IForest(contamination=0.3, maxDepth=2)

cls = IsolationForest(behaviour='new', max_samples=100, contamination=0.1, )

The K-Mean part works well and output on predictions.show():

+---------+—————+ | features|prediction| +---------+—————+ |[0.0,0.0]| 1| |[7.0,9.0]| 0| |[9.0,8.0]| 0| |[8.0,9.0]| 0| +---------+—————+

But again. On the initiation of Iforest: a.k.

iforest = IForest(contamination=0.3, maxDepth=2)

It fails with:

Traceback (most recent call last):   File "/Users/htayeb/miniconda2/envs/ML3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code     exec(code_obj, self.user_global_ns, self.user_ns)   File "", line 1, in     iforest = IForest(contamination=0.3, maxDepth=2)   File "/Users/htayeb/miniconda2/envs/ML3/lib/python3.7/site-packages/pyspark/init.py", line 110, in wrapper     return func(self, *kwargs)   File "/Users/htayeb/miniconda2/envs/ML3/lib/python3.7/site-packages/pyspark_iforest/ml/iforest.py", line 245, in init     self._java_obj = self._new_java_obj("org.apache.spark.ml.iforest.IForest", self.uid)   File "/Users/htayeb/miniconda2/envs/ML3/lib/python3.7/site-packages/pyspark/ml/wrapper.py", line 67, in _new_java_obj     return java_obj(java_args) TypeError: 'JavaPackage' object is not callable

I am running python 3.7 and pyspark 2.4

Thanks a lot again!

On 28 Jun 2019, 12:26 +0300, Yang, Fangzhou notifications@github.com, wrote:

Could you try kmeans example in pyspark to see if it works — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

titicaca commented 5 years ago

It looks like that you havent deploy iforest jar pkg into your spark java class path. You can follow the guide https://github.com/titicaca/spark-iforest/blob/master/python/README.md

yaakovML commented 5 years ago

Ok.

So I redid the deployment of the jar file in my spark dictionary and it didn't work. I did though found the solution and thought it might help someone else, so here is what I did.

I left PyCharm and tried to run it all from the shell. running it from there made the magic. it worked. Now since I do my work on PyCharm I was looking for a solution.

I ended up adding the "config" line to my spark creation:

spark = SparkSession \ .builder.master("local[*]") \ .appName("IForestExample") \ .config('spark.driver.extraClassPath', '/usr/local/Cellar/apache-spark/2.4.0/libexec/jars/spark-iforest-2.4.0.jar') \ .getOrCreate()

which made it work also in PyCharm.

Pay attention that I did check that the pip install worked and that the jar was properly implemented. even with these two things, for some reason, for PyCharm it wasn't enough.

Thanks for the support.