open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

kmeans and EM #69

Closed maxcollard closed 9 years ago

maxcollard commented 9 years ago

The two-step process for kmeans, (1) Assign points to closest cluster center, (2) Move cluster centers to centroids of assigned points seems awfully similar to expectation-maximization type setups. Is this coincidental, or is there a way to frame kmeans as EM?

jtmatterer commented 9 years ago

Yes, for a Gaussian mixture model. Read about it (quite briefly) in section 14.3.7 in the Elements of Statistical Learning.

dlee138 commented 9 years ago

There are also variants of the EM algorithm - hard EM and soft EM. K-means is basically an example of hard EM as you described. You basically try to maximize the expectation of the log-likelihood (the details of the math can be read here (http://ttic.uchicago.edu/~dmcallester/ttic101-07/lectures/em/em.pdf) and upon convergence, you hard assign assign a data point to a cluster. However, in soft EM, instead of hard assigning, the probabilities of any point being in any cluster is used to compute a weighted average for all your points. Points are are given a probabilistic assignment instead of a hard assignment.

maxcollard commented 9 years ago

Thanks!