Closed erikerlandson closed 9 years ago
0) I vote for using the Spark clustering interface style: an object for setting parameters and creating an instance of a separate clustering class. Users may want to try different clusterings and keep those models around -- the current object approach prevents that.
1) I vote for keeping K Medoids as a separate thing that operates on an RDD instead of adding K-Medoids as an implicit functoin. RDD.kMedoids() feels more coupled than KMedoids(RDD) to me.
2) Not sure -- for now, we can create an RDD from a Scala sequence if we need it. If it becomes something we use frequently, it may be worth the effort.
3) I suggest returning a model that can do assignments and return the medoids for each cluster.
Could you add some documentation on the APIs?
I agree with @rnowling on the interface; implicits make more sense for transformations that return a new RDD than for trainers that return a model.
github created a new PR when I renamed the branch: #19 I'm closing this one
Interested in thoughts about interface on this one:
0) maybe best to use the style of interface used by the Spark KMeans, where it's a modeling object and you use builder pattern for parameters,
obj.train(rdd)
, etc1) should it be kMedoids(rdd, ...) or rdd.kMedoids(...)? 2) should I add a variation that operates on Scala sequences instead of RDD? 3) never decided what the return type should be. maybe some kind of MedoidClustering case class?