radanalyticsio / silex

something to help you spark
Apache License 2.0
65 stars 13 forks source link

predict function in Kmedoid model #68

Closed vishal1908 closed 7 years ago

vishal1908 commented 7 years ago

def predict(points: RDD[T]): RDD[Int] Return an RDD produced by predicting the closest medoid to each row

I am using this as :: // provided all the parameters val obj1 = new KMedoids(metric: (Vector, Vector) ⇒ Double, k: Int, maxIterations: Int, epsilon: Double, fractionEpsilon: Double, sampleSize: Int, numThreads: Int, seed: Long) // rows is RDD of vectors val obj2 = obj1.run(rows) val predictions : RDD[Int] = obj2.predict(rows) This is throwing exception Task not serializable

erikerlandson commented 7 years ago

@vishal1908 are you seeing this problem while running in a REPL? Can you copy the exact code (or repl session) that is running?

vishal1908 commented 7 years ago

@erikerlandson I am attaching code. I am running this code in spark shell. I am also attaching the screen shot of the error. silex.txt error

erikerlandson commented 7 years ago

@vishal1908 your object silex is being pulled into the closure that spark is trying to serialize, and it is failing because silex isn't a subclass of Serializable. Changing your definition to object silex extends Serializable { ... will probably make it work. See also here for more background.

One reason this is happening is that your code block creating the KMedoidModel is not inside your main method. When you invoke silex.main, scala is initializing your silex object, and executing the code at object initialization time. If you move the KMedoidModel code inside main it will also be easier to reason about.

vishal1908 commented 7 years ago

Thank you It worked.