mahmoodlab / CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
http://clam.mahmoodlab.org
GNU General Public License v3.0
1.12k stars 362 forks source link

Problem of Segmentation on CAMELYON17 #51

Closed lzx325 closed 3 years ago

lzx325 commented 3 years ago

Thank you for open-sourcing the code of the paper. While I am trying to use your program to segment the CAMELYON17 dataset, I noticed that the stitches of a lot of slides contain large amounts of artifacts which I believe affect the downstream feature extraction: image image Did you experience this in your experiment? And what are your parameters for segmenting CAMELYON16 and CAMELYON17. I don't know which preset is more suitable. Thank you if you can help!

fedshyvana commented 3 years ago

See the bottom section of https://github.com/mahmoodlab/CLAM/blob/master/docs/INSTALLATION.md regarding .svs files, this is likely due to a bug with pixman on your machine that needs to be updated/fixed and not an issue of the CLAM repository. I processed CAMELYON using a much older, experimental version of the repository (before it was cleaned up and organized into CLAM) so the parameters don't directly translate. But I would imagine bwh_biopsy.csv to work well as a starting point. It might still require a lot of trial and error though since I remember there's 5 different centers that produced slides for CAMELYON, with different scanners/staining characteristics so different subsets of the dataset might require different parameters.

lzx325 commented 3 years ago

Thank you for your quick response. So this not only affects the stitch visualization, but also the downstream analysis, am I right? Also, if it is convenient, could you please provide your parameters for segmenting CAMELYON16+17, because we want to reproduce your training result as close as possible.

fedshyvana commented 3 years ago

Yes, my understanding is that the bug causes certain patches from specific downsamples to not be read correctly - so it certainly could affect downstream analysis. So i would definitely suggest fixing it before proceeding.

Regarding segmentation - as i mentioned above, it was processed using an older pipline that uses a different set of representations for the parameters so i would have to write a conversion script and test that it's converted to a format that the current pipeline can process correctly - or alternatively i might be able to share the coordinates that i extracted from C17 slides with you. Why don't you shoot me an email at mlu16@bwh.harvard.edu, and i will try to followup on the issue next week?

Alternatively in the meantime, feel free to just try out bwh_biopsy.csv, if i remember correctly it should work well for majority of the slides, and you should be able to get comparable performance to my results even if the segmentation is not 100% the same.