open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

Use of squared Euclidean distance #86

Open michaelseung opened 9 years ago

michaelseung commented 9 years ago

As k-means uses squared Euclidean distance as the nearest mean, how common is it to use other distances in clustering? K-median, for example, minimizes absolute deviations instead of squared. How is accuracy affected?

dlee138 commented 9 years ago

I would say that the accuracy would be dependent on your graph. Depending on the graph's properties (i.e. how your points are distributed) one method may be more accurate than the other. The reasons why different algorithms exist in the first place is because some are better in certain cases, but worse in others.

Based on my intuition (I may be wrong here) I believe that k-median would be more accurate than k-means if your data has significant outliers that could significantly skew the value of your nearest means for squared Euclidean distance. Absolute deviation would not be affected as severely by these outliers and would give more accurate results in this particular case.

mrjiaruiwang commented 9 years ago

Euclidean distance is a good idea, but if you labelled your vertices with features and non-quantifyable information, you could do a custom distance. Maybe if your data contains annotations, you could look at similarities between words by meaning or the degree of association between two buzzwords.