Issue for reproducing result on the Camelyon preprocessed data

mehran-hosseinzadeh commented 5 months ago

Hi, Thank you for your very good paper, great code-base, and your active repo. I was trying to reproduce your results with your preprcocessed data that was available in the repo, and I had some questions regarding that. I have run the code for "CluSiam" model, but the number of clusters I get in the end is more (3, 4 or around this). When I observed, there was one big cluster containing both normal and tumour cases, and two smaller clusters. Should I change anything in the configs or adjust any further parameters? Also, there is a print line in your training code that shows the number of non-zero clusters, with and without adding the Gumble noise. When running with K=100 maximum clusters, even in the latest epochs, the number of clusters after adding noise is nearly all 100 of them. Might that be the problem for my case, or did you also have a similar thing like this? I was thinking maybe there is too much noise added that somehow ruins the formation of clusters in higher epochs. Should that be somehow controlled? Also, I had another question about the range of losses. When I run the training, the contras_loss is -0.9635 after 50 epochs, which I guess makes sense. But, my cluster_loss is nearly -0.004. Does that look correct? I was thinking maybe the loss is so small and gradients vanish for updating the assigner network. Should I change anything or add a coefficient to multiply that? Or, should I change the alpha? If you also have any other suggestions to resolve the issue, I'd appreciate your help. Thanks again!

wwyi1828 commented 5 months ago

Hi,

Thank you very much for your interest in our research.

Regarding the cluster numbers, since the clustering process is unsupervised and relies entirely on the intrinsic structure of the data, it is common to see a few additional clusters with tail clusters containing only a few samples. However, I have not observed large-scale mixing of normal and tumor clusters. Could you try re-running the experiment? Alternatively, you might slightly increase the alpha value to 0.6 or 0.7, which should result in more clusters. While this may be less ideal for visualization, it tends to yield better representations for downstream tasks by enhancing the model's discriminative capabilities.

Additionally, the training method of CluBYOL tends to be more stable and results in more clusters. From my observations, there seems to be a trade-off between cluster quality and representation quality. For better visualization, fewer clusters (preferably matching the number of classes) are desired. However, overly confident clustering can negatively impact the performance in downstream tasks and KNN evaluations. I have not investigated deeply why CluBYOL tends to be more stable, but it may be due to a better integration of BYOL’s approach with the cluster loss.

As for the number of clusters after adding noise, it typically ranges from 90 to 100. This suggests the vectors can move smoothly among clusters to explore different cluster combinations.

The cluster loss you observed seems normal. With 100 clusters, the lower bound of the cluster loss is approximately -0.0101010101010101 (-1/(k-1) e.g. two clusters can approach -1, three clusters can only approach -1/2), which falls within the expected range. Let me know if you have further questions.

mehran-hosseinzadeh commented 5 months ago

Thanks for your detailed explanation. I will retry the experiment, also test with different alpha values, and will also try CluBYOL. I'll let you know again here if I have issues or questions.

wwyi1828 / CluSiam

Issue for reproducing result on the Camelyon preprocessed data #5