Closed yaakovML closed 5 years ago
It hasn't been tested on python 2. Try with python 3.6.x.
Exactly what I am doing now.
Will let you know. On 27 Jun 2019, 16:11 +0300, Yang, Fangzhou notifications@github.com, wrote:
It hasn't been tested on python 2. Try with python 3.6.x. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Same problem. it shows the same problem for both python 3.7 and 2.7
any idea why?
On 27 Jun 2019, 16:11 +0300, Yang, Fangzhou notifications@github.com, wrote:
It hasn't been tested on python 2. Try with python 3.6.x. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Not sure. Does it work for you with python 3.6?
I think some pyspark api is not compatible with python 2.7. I will look into this problem later.
I also tried with python 3, it didn't work. On 28 Jun 2019, 10:51 +0300, Yang, Fangzhou notifications@github.com, wrote:
Not sure. Does it work for you with python 3.6? I think some pyspark api is not compatible with python 2.7. I will look into this problem later. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Does your pyspark work? Check if you have installed and setup your pyspark correctly.
Pyspark and vectors work. Thanks for your help! On 28 Jun 2019, 11:35 +0300, Yang, Fangzhou notifications@github.com, wrote:
Does your pyspark work? Check if you have installed and setup your pyspark correctly. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Could you try kmeans example in pyspark to see if it works
Hi and again, thanks so much for your support.
I ran the following code:
from pyspark.conf import SparkConf from pyspark.sql import SparkSession from pyspark.ml.linalg import Vectors
spark = SparkSession \ .builder.master("local[*]") \ .appName("IForestExample") \ .getOrCreate()
data2 = [(Vectors.dense([0.0, 0.0]),), (Vectors.dense([7.0, 9.0]),),(Vectors.dense([9.0, 8.0]),), (Vectors.dense([8.0, 9.0]),)] df = spark.createDataFrame(data2, ["features"]) df.show()
from pyspark.ml.clustering import KMeans from pyspark.ml.evaluation import ClusteringEvaluator
kmeans = KMeans().setK(2) model = kmeans.fit(df) predictions = model.transform(df) predictions.show()
from pyspark_iforest.ml.iforest import *
iforest = IForest(contamination=0.3, maxDepth=2)
cls = IsolationForest(behaviour='new', max_samples=100, contamination=0.1, )
The K-Mean part works well and output on predictions.show():
+---------+—————+ | features|prediction| +---------+—————+ |[0.0,0.0]| 1| |[7.0,9.0]| 0| |[9.0,8.0]| 0| |[8.0,9.0]| 0| +---------+—————+
But again. On the initiation of Iforest: a.k.
iforest = IForest(contamination=0.3, maxDepth=2)
It fails with:
Traceback (most recent call last):
File "/Users/htayeb/miniconda2/envs/ML3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "
I am running python 3.7 and pyspark 2.4
Thanks a lot again!
On 28 Jun 2019, 12:26 +0300, Yang, Fangzhou notifications@github.com, wrote:
Could you try kmeans example in pyspark to see if it works — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
It looks like that you havent deploy iforest jar pkg into your spark java class path. You can follow the guide https://github.com/titicaca/spark-iforest/blob/master/python/README.md
Ok.
So I redid the deployment of the jar file in my spark dictionary and it didn't work. I did though found the solution and thought it might help someone else, so here is what I did.
I left PyCharm and tried to run it all from the shell. running it from there made the magic. it worked. Now since I do my work on PyCharm I was looking for a solution.
I ended up adding the "config" line to my spark creation:
spark = SparkSession \ .builder.master("local[*]") \ .appName("IForestExample") \ .config('spark.driver.extraClassPath', '/usr/local/Cellar/apache-spark/2.4.0/libexec/jars/spark-iforest-2.4.0.jar') \ .getOrCreate()
which made it work also in PyCharm.
Pay attention that I did check that the pip install worked and that the jar was properly implemented. even with these two things, for some reason, for PyCharm it wasn't enough.
Thanks for the support.
hi,
I followed your guideline and got:
iforest = IForest(contamination=0.3, maxDepth=2) Traceback (most recent call last): File "/Users/htayeb/miniconda2/envs/cs/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2878, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in iforest = IForest(contamination=0.3, maxDepth=2) File "/Users/htayeb/miniconda2/envs/cs/lib/python2.7/site-packages/pyspark/init.py", line 110, in wrapper return func(self, *kwargs) File "/Users/htayeb/miniconda2/envs/cs/lib/python2.7/site-packages/pyspark_iforest/ml/iforest.py", line 245, in init self._java_obj = self._new_java_obj("org.apache.spark.ml.iforest.IForest", self.uid) File "/Users/htayeb/miniconda2/envs/cs/lib/python2.7/site-packages/pyspark/ml/wrapper.py", line 67, in _new_java_obj return java_obj(java_args) TypeError: 'JavaPackage' object is not callable
any idea why?
I am using python 2.7 and the spark context is alive.