open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

Yi in Clusters #70

Open akim1 opened 9 years ago

akim1 commented 9 years ago

Regarding clustering of nodes, when we use the notation Yi, aren't we assuming an a priori knowledge of which node belongs to which cluster? If so, how is this knowledge obtained or is it constructed based on some characteristics that we're looking for?

ghost commented 9 years ago

I think we were using the notation that the data is eventually written in the form (xi,yi), but initially we just have xi's, and then we use the iterating clustering algorithms to determine what yi is.

yaxigeigei commented 9 years ago

At least in the case of K-means, Yis are initiated randomly (without any priori knowledge) and modified by every iteration until convergence.

jtmatterer commented 9 years ago

@akim1 the Y_i's are the true labels of the data. For unsupervised learning (of which clustering is an instance), we don't have access to any of the Y_i's. Which is one of the reasons why the characterization of the solution to the clustering problem in terms of the product of a permutation matrix and the vector Y is useless for devising a practical solution.

@aceeccc and @yaxigeigei are referring to the $\hat{Y}_i$ the estimates of the labels produced by whatever method is run.