mahmoodlab / CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
http://clam.mahmoodlab.org
GNU General Public License v3.0
1.02k stars 340 forks source link

Different slide numbers for TCGA-KICH/KIRC/KIRP in TCGA website and TCGA-RCC you mentioned in the paper #191

Closed Shentl closed 1 year ago

Shentl commented 1 year ago

Hi,

In paper, you mentioned that TCGA-RCC is a subset of TCGA with three subtypes of (TCGA-KICH), (TCGA-KIRC), and (TCGA-KIRP). There are a total of 884 FFPE WSIs, including 111 KICH WSIs, 489 KIRC WSIs, and 284 KIRP WSIs.

But in the TCGA webiste, TCGA-KICH has 121 slides, TCGA-KIRC has 519 slides, TCGA-KIRP has 300 slides (seach diagnostic slides). I found that KIRC/KIRP/KICH in TCGA website have more slides in TCGA-RCC you mentioned in the paper, and I found the same problem for the TCGA-NSCLC dataset.

So, is TCGA-RCC just the simple combination of TCGA-KICH/KIRC/KIRP, or it is another public dataset that chooses some slides in TCGA-KICH/KIRC/KIRP?

fedshyvana commented 1 year ago

Yes TCGA-RCC is just a combination of TCGA-KICH/KIRC/KIRP. At the time, I audited some slides using low-resolution thumbnails and removed some slides that were deemed poor quality, hence the lower number. You are free to use the entire database for your study - I think as long as you use the same set of slides for when benchmarking different algorithms it should be okay.

Shentl commented 1 year ago

Can you please tell me the name of slides that you removed from the TCGA dataset? Thanks a lot My e-mail is 3462074422@qq.com