master / spark-stemming

Spark MLlib wrapper for the Snowball framework
BSD 2-Clause "Simplified" License
33 stars 20 forks source link

How to import spark-stemming via pyspark #1

Closed GeorgesAlkhouri closed 7 years ago

GeorgesAlkhouri commented 8 years ago

Hello,

I want to try your stemming package for Spark and included the package to my spark-submit command.

./spark-submit --packages master:spark-stemming:0.1.1 run.py

But when I want to import the Stemmer via pyspark it cannot be found.

I tried to import it like this

from pyspark.mllib.feature import Stemmer

and this

from pyspark.ml.feature import Stemmer

Currently, I am using Spark version 2.0.0.

Thanks

peterhurford commented 8 years ago

Hi @GeorgesAlkhouri - this package is a Scala package, so it is not possible to import it directly. Instead, you (or someone else) would need to write a Python wrapper. For example, in Spark, the Tokenizer class is written in Scala but a Python wrapper is then provided to allow importing from pyspark.ml.feature.

Notably, if you do add a wrapper, you'd then have to import your wrapper from where you wrote it, since it would not be in the pyspark.ml.feature module.

GeorgesAlkhouri commented 7 years ago

Got it, thanks.