wwyi1828 / CluSiam

Improving Representation Learning for Histopathologic Images with Cluster Constraints
MIT License
16 stars 1 forks source link

cluster visualization #2

Closed akidway closed 5 months ago

akidway commented 5 months ago

Hi, @wwyi1828,

Thank you for your excellent work.

After reading your article, I still have some questions.

From my understanding, a single patch from a slide is fed into CluSaim, which outputs a 1*k vector. The argmax is then applied to this vector. However, based on my understanding, there should be k clusters, but Figure 3 shows only two clusters.

Similarly, in Figure 4 (b) and (c), how did you determine the number of clusters (red dashed line)?

Regarding the classification task, the paper mentions: "We aggregated patch-level predictions to slide-level predictions using two multiple-instance learning techniques: Max-Pooling (Max) and Dual-Stream Multiple-Instance Learning (DSMIL)." Is a patch-level prediction a k*D vector encoded from a single patch?

Additionally, since DSMIL utilizes two different magnification levels (5x and 20x), did you train separate models for each magnification level?

I would greatly appreciate it if you could provide more detailed explanations.

wwyi1828 commented 5 months ago

Hello,

Thank you for your insightful questions. Let me address them one by one.

Regarding the output, you are correct that it produces an NK vector for each patch. In Figure 4 (b) and (c), the number of existing clusters k was determined based on the argmax function applied to the NK vectors. The hyper-parameter K represents the maximum number of clusters allowed or the cluster action space. However, many clusters may be empty after applying the argmax function for cluster assignments. Moreover, using argmax can lead to some loss of information. For example, the vectors (0.6, 0.4) and (0.9, 0.1) would yield the same result after applying argmax, even though they represent different degrees of confidence in the cluster assignments. A larger value of K generally leads to more stable clustering results, as it provides more flexibility for the model. For instance, although K is set to 100, after training is complete, if we use the argmax function on the output vector, the count for each cluster might be (0:899, 1:1, 2:0, ..., 99:124).

Regarding the classification task, patch embeddings are obtained from the encoder's outputs. These embeddings are then aggregated using MIL techniques to obtain slide-level predictions. The modules after the encoder, such as the assigner, are specific to the pretraining task and are dropped once the pretraining is done. This is a common practice in self-supervised learning frameworks, as the modules after the encoder often learn information related to the pretext tasks.

I did not adopt a multiscale setting. I only used 20x DSMIL for the classification task.

akidway commented 5 months ago

Thank you for your patient explanation. Best wishes.