Closed jiankang1991 closed 5 years ago
Hi @jiankang1991 , good question. When one pretrains AE just for reconstruction loss, it is highly unlikely that the underlying topology is preserved in the latent code space. In other words, there is no guarantee that the AE features would represent much better clustering prior than the original raw features. In my experiments for most datasets discussed in the paper, I found that the use of AE features for graph construction impaired the final quantitative results.
However, in your case, due to large feature size I should agree with you that it makes more sense to use latent features for graph construction rather original features. Moreover, the curse of dimensionality will be largely at play in this case.
Thank you for your reply. For the hyperparameters, which parameter should be tuned when the method is adopted to a new image dataset, besides the number of neighborhood? Since I run it on my dataset, after about 500 epochs, there are still a thousand of clusters. I normalize the dataset in the range of 0 to 1. Do you have any other suggestions?
Thank you very much.
Major hyper-parameter is k-NN neighbours. Try increasing 'k'. Also normalise the dataset in the range [-1, 1]. Can you also share plot showing the num of clusters w.r.t. training epochs ?
I did not save it. As I remembered, the number of images I have is 27000, it decreases gradually from about 26990 clusters in the first epoch, until 1000 clusters in the 500 epoch. Every epoch, the number of clusters decrease at about 20-100. The ground truth cluster number is 10.
The number of K is 10
Hi @shahsohil , the work is very interesting! I have a question about the construction of mKNN graph. In the project, I find that you use the original data to measure the similarity and construct the mkNN graph. Is there a particular reason here? Why not use the latent representation feature of the pretrained AE for the graph? If I have a large image patch, e.g. 256x256 with multiple input bands, it will be a large computation cost for the creation of graph in the original image space.
Thank you.