Questions about CAMELYON16

pzSuen commented 2 years ago

Hello, Thanks for your amazing work!

I have read your paper carefully. You said in the paper, " There are in total 3.7 millions patches from the CAMELYON-16 dataset", which means about 9,200 patches per slide. However, after I processed this dataset using CLAM, I got around 15,867 patches for each slide, which is much more bigger and even double compared with former work TransMIL and DSMIL. Can you tell me how you compute the patch number and the way you preprocess slides?

Looking forward for your replay!

hrzhang1123 commented 2 years ago

Hi, TransMIL, DSMIL and ours have the consistent numbers of patches on 20X. Also, as reported in CLAM paper, there are about 40000 patches per slide on 40X, approximately equivalent to 10000 patches per slide on 20X, which is also very similar to these three. Perhaps you can adjust the thresholds in OTSU.

pzSuen commented 2 years ago

Hi, TransMIL, DSMIL and ours have the consistent numbers of patches on 20X. Also, as reported in CLAM paper, there are about 40000 patches per slide on 40X, approximately equivalent to 10000 patches per slide on 20X, which is also very similar to these three. Perhaps you can adjust the thresholds in OTSU.

You are right. I also have some doubts about CLAM. From the paper of CAMELYON16, the data contains two parts: 20x and 40x. How CLAM achieve processing them all at 40x? As you get patches at 20x, do you get patches at level 0 for 20x and level 1 for 40x? And there are slides in CAMELYON16 have duplicate areas (as seen in the image), do you filter out these duplicate areas?

test_normal_012

hrzhang1123 commented 2 years ago

Hi, 1) You can check the specimen-level pixel size of each slide using Qupath. 2) Those are not simply duplicate regions, but from different layers of tissues.

pzSuen commented 2 years ago

Hi,

You can check the specimen-level pixel size of each slide using Qupath.

Those are not simply duplicate regions, but from different layers of tissues.

Thank you for your advice. I check the pixel size from CAMELYON16 paper and QuPath. As seen in the picture, there are two kinds of pixel size: 0.243 (20x at level 0, the level is what you mean layers.) and 0.226 (40x at level 0). Note that the pixel size of noral_144.tif in train data is unknown, I part it at 40x.

And I further visualized the patches I extracted, the patches at the same level seem similar even though the magnification is different.

So, we come to the original questions.

Do you process 20x data at level 0 and 40x data at level 1?
Could you please describe the preprocess in more detail?
Is pixel size more important than magnification? Is it better to extract patches at similar pixel size rather than the same magnification?

hrzhang1123 commented 2 years ago

Hi,

The pixel size is more relevant to the actual 'magnification' we want w.r.t the digitized images. The 20X, 40X provided in the Camelyon16 paper are about the objective lens of different scanners.
I used a sliding window and the output masks from OTSU as the reference to crop the patches. Will submit the code for it. So busy for the time being.
The 'layer of tissue' I mentioned is not the 'level' in WSI as you mean. The similar areas (which you refer to as duplicate areas) were cut from different layers of a piece of tissue in the slide-making process. They are not absolutely identical, as you can observe.

hrzhang1123 / DTFD-MIL

Questions about CAMELYON16 #3