[WIP] Added kmeans - Githubissues

msmbuilder / msmbuilder-legacy

Legacy release of MSMBuilder

http://msmbuilder.org

GNU General Public License v2.0

25 stars 28 forks source link

[WIP] Added kmeans #335

Closed kyleabeauchamp closed 10 years ago

kyleabeauchamp commented 10 years ago

We should probably support kmeans...

rmcgibbo commented 10 years ago

Overall -1. This needs to be thought out more.

(1) Our clustering pipeline is conceptually structured around ideas of distance metrics, and that's totally abandoned here, since kmeans is only for euclidean. If nothing else, we should explain this at the script-level documentation and throw nice errors there, so users aren't only seeing the TypeErrors in a traceback. (2) Why are we using sklearn kmeans when sklearn isn't currently a dependency, and there is an implementation of kmeans available in scipy which is already a dependency?

kyleabeauchamp commented 10 years ago

(1). We can add enough error checking to make this work properly. We already have several choices of metrics and clustering algorithms that are incompatible, so there's no reason to reject kmeans for this reason alone. MSMBuilder3 will provide an opportunity for a clean rewrite of things.

(2). IMHO sklearn is probably better maintained and with a cleaner interface. I'm worried about the fact that clustering is buried deep within scipy and possibly maintained by people without domain expertise. We also currently have optional dependencies, so that's really nothing new.

rmcgibbo commented 10 years ago

I'm worried about the fact that clustering is buried deep within scipy and possibly maintained by people without domain expertise.

Really? You think there's a bug in scipy's kmeans? Do you actually have any evidence for that? I'm highly skeptical If there's a convincing reason that the sklearn implementation is actually better, (faster, more scalable), etc, I'm all ears. But we shouldn't make MSMBuilder even harder to install just because we prefer programming against the sklearn API.

kyleabeauchamp commented 10 years ago

We can switch to scipy, the choice of library doesn't really matter.

kyleabeauchamp commented 10 years ago

The point is that this is pretty easy to add and could be useful to some people.

kyleabeauchamp commented 10 years ago

Closing this temporarily, will be reappear soon.