mahmoodlab / CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
http://clam.mahmoodlab.org
GNU General Public License v3.0
1.02k stars 340 forks source link

Patches considered in Background while patching #161

Closed shubhaminnani closed 1 year ago

shubhaminnani commented 1 year ago

Hi @fedshyvana , I am creating patches from TCGA dataset (GBM cohort), and see that patches have background considered. I am using the same preset as tcga.csv provided in repo.

Sample Images: TCGA-08-0359-01Z-00-DX1.FBED91D4-4D46-4354-ABCF-3D2EBDA42C6D TCGA-08-0359-01Z-00-DX1 FBED91D4-4D46-4354-ABCF-3D2EBDA42C6D

TCGA-08-0360-01Z-00-DX1 CE8208FA-E2FA-44C9-9EA7-9DE5278BD217 TCGA-08-0360-01Z-00-DX1 CE8208FA-E2FA-44C9-9EA7-9DE5278BD217

TCGA-08-0357-01Z-00-DX1 F0E652B4-DE1D-41AA-A531-FEBC41C62D4F TCGA-08-0357-01Z-00-DX1 F0E652B4-DE1D-41AA-A531-FEBC41C62D4F

Problem: How are this patches handled? Is any other check applied before storing the coordinates? Does this not affect attention to be baised in noise or tissue?

Thanks, Shubham

fedshyvana commented 1 year ago

There are no checks applied before storing coordinates - although in theory we can apply a filtering step when the patches are actually loaded for feature extraction. In the examples you showed here, it may be possible to filter out those irrelevant regions simply by increasing the threshold sthresh . In practice we did not find that these extra background region receive any attention in the trained models which makes sense since presumably they're consistently present for all classes in your dataset - if they're somehow only correlated with a single class then you're right that the model may falsely associate their presence with the class label.

shubhaminnani commented 1 year ago

Thanks @fedshyvana