mahmoodlab / CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
http://clam.mahmoodlab.org
GNU General Public License v3.0
1.02k stars 340 forks source link

Image Augmentation #185

Closed nabilapuspit closed 1 year ago

nabilapuspit commented 1 year ago

Hi, thank you for making such a great framework! however, I found that this framework not working so well on my project. I'm currently working on TCGA with very large data variations. I planned to apply image augmentation such as stain normalization towards this framework but unfortunately I'm having a hard time finding the right place. At first I planned to apply the stain norm in the creating_patch_fp.py , but as you said in the updates README.md, since

CLAM only requires image features for training, it is not necessary to save the actual image patches, the new pipeline rids of this overhead and instead only saves the coordinates of image patches during "patching" and loads these regions on the fly from WSIs during feature extraction.

I think it's not the right place. can you give me some advice about my problem? thank you!

fedshyvana commented 1 year ago

Hi, sorry for the super late reply, create_patch_fp doesn't actually handle the underlying image pixels of "patches", it just stores the coordinates. The actual images are streamed to the image encoder in extract_features_fp.py You can see the images are read, and the transforms are applied here: https://github.com/mahmoodlab/CLAM/blob/9482cbc72df522087cfbaa3e6b52da5207a7980a/datasets/dataset_h5.py#L155 . So you can consider modifying the dataset function by passing in a custom transform function that performs e.g. stainnorm or data augmentation.

nabilapuspit commented 1 year ago

I tried to apply Torchstain on the place you've recomend, but always got

IndexError: kthvalue(): Expected reduction dim 0 to have non-zero size.

as I discussed in this issue but still haven't got the solution and don't know why. Can you help me to debug this issue?

ShiCrazy commented 1 year ago

I tried to apply Torchstain on the place you've recomend, but always got

IndexError: kthvalue(): Expected reduction dim 0 to have non-zero size.

as I discussed in this issue but still haven't got the solution and don't know why. Can you help me to debug this issue?

Hello! I'd like to ask if you have already performed stain normalization using torchstain on CLAM? I saw your question in the torchstain's issue as well, and it seems quite difficult to perform such an operation in the create_patches_fp.py. Should it be done in create_patches.py instead?

nabilapuspit commented 1 year ago

I tried to apply Torchstain on the place you've recomend, but always got

IndexError: kthvalue(): Expected reduction dim 0 to have non-zero size.

as I discussed in this issue but still haven't got the solution and don't know why. Can you help me to debug this issue?

Hello! I'd like to ask if you have already performed stain normalization using torchstain on CLAM? I saw your question in the torchstain's issue as well, and it seems quite difficult to perform such an operation in the create_patches_fp.py. Should it be done in create_patches.py instead?

Unfortunately, I still cannot performed stain normalization using torchstain on CLAM because its kind a tricky to do such a torch operation inside multibatches and multiworkers process and I still cannot figure out the solution about it. But no, I have checked and go through the create_patches.py file and it is not possible to apply the torchstain there. You can try to do it in the feature extraction part as the author's suggestion

Hi, sorry for the super late reply, create_patch_fp doesn't actually handle the underlying image pixels of "patches", it just stores the coordinates. The actual images are streamed to the image encoder in extract_features_fp.py You can see the images are read, and the transforms are applied here:

https://github.com/mahmoodlab/CLAM/blob/9482cbc72df522087cfbaa3e6b52da5207a7980a/datasets/dataset_h5.py#L155

. So you can consider modifying the dataset function by passing in a custom transform function that performs e.g. stainnorm or data augmentation.