wwyi1828 / CluSiam

Improving Representation Learning for Histopathologic Images with Cluster Constraints
MIT License
16 stars 1 forks source link

Clarification on Patch-wise Label Assignment #6

Open astammer opened 2 months ago

astammer commented 2 months ago

Hello,

I am seeking clarification on how the labels for patches are assigned in the project. I apologize if I have overlooked any relevant information regarding this.

Is it correct to assume that every patch takes the label of the Whole Slide Image (WSI) it emerged from?

Thank you for your assistance!

Kind regards

wwyi1828 commented 2 months ago

The patch labels are not simply inherited from the WSI labels. Instead, each individual patch is assigned a label based on the provided annotations, such as those in the official XML files for the Camelyon16 dataset.

These annotations delineate the specific regions within each WSI that contain tumor tissue. By cross-referencing the coordinates of the extracted patches with these annotated regions, each patch can be more accurately labeled as different classes depending on whether it overlaps with the specifically annotated areas.

astammer commented 2 months ago

Thank you very much for your clarification. This was very helpful!

astammer commented 1 day ago

Hello,

Thank you for providing the preprocessed patches—it has been very helpful! However, I have a follow-up question, and would appreciate your clarification on a few points:

Preprocessed Patches Count: I downloaded the folder containing the preprocessed patches from your repository, which currently holds approximately 2.6 million patches, matching the count stated for the training set. Could you please confirm whether this is expected or if the patches should include the test set and something might have gone wrong during my download process?

Annotation Groups in Tumor Slides: I’m encountering some confusion regarding the annotation groups for certain tumor slides. For example, the slide tumor095 has 30 annotations according to its XML file. These are grouped into _0 and _2 under the PartOfGroup attribute. Although I haven’t found an official explanation, it seems that groups _0 and _1 correspond to tumorous regions, while _2 relates to normal tissue. One example of this interpretation can be found here. If this interpretation is correct, annotations 0, 21-23, and 27-30 should be classified as tumorous. In the preprocessed patches folder for tumor095, I see four subfolders: NotAnnotated, Annotation 0, Annotation 22, and Annotation 23. Can you give some insight on how these subfolders were generated?

I appreciate any information you can provide on these points. Thank you for your time!