saurfang / spark-knn

k-Nearest Neighbors algorithm on Spark
Apache License 2.0
239 stars 113 forks source link

Model save support #15

Open davis-varghese opened 8 years ago

davis-varghese commented 8 years ago

I saved a model(KNNClassificationModel) using java serialization and when I use it later, I always get java.lang.IllegalArgumentException: Flat hash tables cannot contain null elements. on the dataframe output of the model.transform(inputDataFrame).

Is there a better way of saving and using model? like support for MLWritable/Saveable traits. In our use case, we create a model and use it later

mindcrusher11 commented 8 years ago

I am also looking for solution to save model using scala spark

Sambor123 commented 7 years ago

I also had this problem,does it any solution for it?

rachmaninovquartet commented 7 years ago

I've tried like this: sc.parallelize(Seq(knnModel), 1).saveAsObjectFile("/user/you/knnTest/" + "KNN") val model = sc.objectFile[KNNClassificationModel]("/user/you/knnTest/" + "KNN").first()

but the model pulled back in no longer seems to work, which is strange since this has worked for all my other models.

wzjmail commented 4 years ago

I also encountered this problem. I attempted to serialize this model and load again. but rdd[tree] cannot be deserialized correctly. it looks like that metricTree have some problem. if you have sollution,comment please

alexnb commented 4 years ago

Also saving using PipelineModel.save() does not work:

Caused by: java.lang.UnsupportedOperationException: Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable. Non-Writable stage: knnc_95d9ce15f990 of type class org.apache.spark.ml.classification.KNNClassificationModel
at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:231)
at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:228)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:228)
at org.apache.spark.ml.PipelineModel$PipelineModelWriter.<init>(Pipeline.scala:336)
at org.apache.spark.ml.PipelineModel.write(Pipeline.scala:320)
at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:306)
at org.apache.spark.ml.PipelineModel.save(Pipeline.scala:293)
... 16 more
githubthunder commented 3 months ago

HI, Has this problem been solved now?