Questions on the clustering algorithm

eriche2016 commented 4 years ago

Hi, I recently read your paper and it is an excellent work. However, I have some confusion about the clustering part. According to my knowledge, at every epoch, you re-run the clustering algorithm to obtain the cluster ID for each sample, and my question is: How to ensure that each cluster ID has been correctly assigned to a subset of samples in order to train the model in a hard manner? Suppose there are three samples (x1, x2, x3) which can be grouped into two clusters (C1, C2) after convergence, and (x1, x1) belong to C1, x3 belongs to C2 at t-th epoch, however, at (t+1)-th epoch, (x1, x2) may belong to C2 while x3 belongs to C1 when we run the clustering algorithm, it would be hard to train you model since the assignment ambiguity of the cluster ID. Can you give me some hints on it? Thanks.

yxgeee commented 4 years ago

Hi, thank you for the question. There's no need to make sure that cluster IDs are aligned with (potential) the same subsets, because we compute the cluster centers to initialize the classifier weights each time after re-clustering. Please refer to https://github.com/yxgeee/MMT/blob/master/examples/mmt_train_dbscan.py#L200

eriche2016 commented 4 years ago

Thanks for the quick reply! I got the idea.

yxgeee / MMT

Questions on the clustering algorithm #16