master / spark-stemming

Spark MLlib wrapper for the Snowball framework
BSD 2-Clause "Simplified" License
33 stars 20 forks source link

Change input/output types to Array(StringType) #5

Closed kmbnw closed 6 years ago

kmbnw commented 6 years ago

If you pull this request, then the Stemmer will work with Seq[String], rather than merely String. This allows easy (easier?) use with Tokenizer and other transformers that work with arrays. I also updated to Spark 2.2 though it seemed to compile fine with the original Spark version as well.

This PR fixes an issue seen when using with Array(StringType): java.lang.IllegalArgumentException: requirement failed: Input type must be string type but got ArrayType(StringType,true). See e.g. the discussion at https://spark-packages.org/package/master/spark-stemming

kmbnw commented 6 years ago

Closing with new PR off frozen branch.