hrzhang1123 / DTFD-MIL

MIT License
120 stars 19 forks source link

TCGA lung cancer datasets #12

Open wk5475 opened 1 year ago

wk5475 commented 1 year ago

Could you please provide more details on generating TCGA datasets?

Dootmaan commented 1 year ago

Hi @wk5475, I have the same question. I found some description in the supplementary materials of this paper:

TCGA Lung Cancer. Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC) are two sub-type of cancers in the TCGA lung cancer dataset, with 534 LUAD and 512 LUSC slides, respectively. There are only slide-level labels available for this dataset. Compared to CAMELYON-16, tumor regions in tumor slides are signifi- cantly larger in this dataset.

LUAD and LUSC can be directly downloaded from the TCGA website (e.g., https://portal.gdc.cancer.gov/projects/TCGA-LUSC). The problem is the result in TCGA paper seems quite different from another work https://arxiv.org/pdf/2301.08125.pdf (This may be due to the different dataset split tho).