master / spark-stemming

Spark MLlib wrapper for the Snowball framework
BSD 2-Clause "Simplified" License
33 stars 20 forks source link

Stemming stage does not implement Writable #11

Closed syfantid closed 5 years ago

syfantid commented 5 years ago

Hello, I'm using Spark MLlib along with Scala and trying to write my model to file using either of the commands below:

model.save("output/model-random-forests") model.write.overwrite().save("output/model-random-forests")

But I get an exception due to the stemming stage:

Exception in thread "main" java.lang.UnsupportedOperationException: Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable. Non-Writable stage: stemmer_14444071fe16 of type class org.apache.spark.mllib.feature.Stemmer
    at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:231)
    at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:228)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:228)
    at org.apache.spark.ml.PipelineModel$PipelineModelWriter.<init>(Pipeline.scala:335)
    at org.apache.spark.ml.PipelineModel.write(Pipeline.scala:319)
    at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:157)
    at org.apache.spark.ml.PipelineModel.save(Pipeline.scala:292)
    at spam$.main(spam.scala:190)
    at spam.main(spam.scala)

Any chance that the stemmer will implement Writable?

master commented 5 years ago

Thanks for bringing this up @syfantid I'm not actively working on spark-stemming at the moment. Contributions are welcome!

harveyxia commented 5 years ago

This was implemented in https://github.com/master/spark-stemming/pull/10

master commented 5 years ago

Thanks @harveyxia