Closed xurxodiz closed 12 years ago
Okay not so sure right now, I think I got things mixed in the rush of paper submit. Let's doublecheck everything.
I suspect EM is the only one that can give me membership percentages, so perhaps it's better to stay with it even if the results are slightly worse (haven't checked yet).
Ah, fuck it, I understand it now. It all has to do with the MakeDensityBasedClusterer thing. Well, I'm learning a lot about clustering, that's for sure :D
Only EM, by default, provides probability memberships. The others have to be wrapped in, well, MakeDensityBasedClusteres.
Currently checking validity of the clusterer with Weka. Documentation/tutorial for the Weka GUI Experimenter can be found at http://www.cs.utexas.edu/~mooney/cs391L/hw2/Experiments.pdf.
Results upon experimenting (check exp.exp)
CLOPE can't be used; it's only for nominal values DBSCAN and it's variation OPTICS return exceptions: they are unable to classify some instances. Remaining: FarthestFirst, HierarchicalCluster, sIB, XMeans, KMeans, EM.
Performing tests based on them, with confidence 0.05, XMeans, KMeans and EM always consider each other not statistically significantly different, while the others always rank worse.
Therefore we can pick any of them three, so let's run with EM.
New results seem to shift better performance to SimpleKMeans than to EM. Right now, workflow works around the code by using Weka's GUI interface. Let's update the code so it uses SKM directly.