yge15 / TCGA_Microbial_Content

Comprehensive analysis of microbial content in whole-genome sequencing samples from The Cancer Genome Atlas project
4 stars 0 forks source link

Only 25 TCGA cancers analyzed not all 33 cancers? #4

Closed hermidalc closed 3 months ago

hermidalc commented 3 months ago

Hi - given this is to be a comprehensive analysis of TCGA microbial abundances, why are there only 25 cancers analyzed and not all 33 cancers?

yge15 commented 3 months ago

Hi, There are only 25 cancer types with WGS data available in TCGA.

hermidalc commented 3 months ago

Hi, There are only 25 cancer types with WGS data available in TCGA.

If you go to GDC and to the Cohort Builder web tool and build a query with

PROGRAM <- TCGA
EXPERIMENTAL STRATEGY <- WGS
DATA TYPE <- Aligned Reads

This results in 8,840 cases out of 11,428 total TCGA cases. Switch to "Table View" and download the case TSV for these 8,840 cases.

Count the unique cases per project. There are 33 cancers with WGS data:

> cases_df <- read.delim("cases.tsv")
> data.frame(table(cases_df$project.project_id))
        Var1 Freq
1   TCGA-ACC   74
2  TCGA-BLCA  411
3  TCGA-BRCA  952
4  TCGA-CESC  271
5  TCGA-CHOL   49
6  TCGA-COAD  371
7  TCGA-DLBC   42
8  TCGA-ESCA  118
9   TCGA-GBM  347
10 TCGA-HNSC  482
11 TCGA-KICH   86
12 TCGA-KIRC  124
13 TCGA-KIRP  216
14 TCGA-LAML   50
15  TCGA-LGG  461
16 TCGA-LIHC  324
17 TCGA-LUAD  464
18 TCGA-LUSC  337
19 TCGA-MESO   73
20   TCGA-OV  363
21 TCGA-PAAD  173
22 TCGA-PCPG  165
23 TCGA-PRAD  414
24 TCGA-READ  143
25 TCGA-SARC  223
26 TCGA-SKCM  223
27 TCGA-STAD  436
28 TCGA-TGCT  253
29 TCGA-THCA  477
30 TCGA-THYM  111
31 TCGA-UCEC  482
32  TCGA-UCS   50
33  TCGA-UVM   75
hermidalc commented 3 months ago

Thank you again @yge15 for your help answering questions I very much appreciate it. Posting for others wondering the same thing that indeed like you said TCGA added many WGS samples Dec 2023 and Mar 2024

http://gdc.cancer.gov/news-and-announcements/new-tcga-whole-genome-data-and-five-new-nci-match-projects

http://gdc.cancer.gov/content/additional-tcga-wgs-alignments-and-variant-calls-new-nci-match-trial-arms-data-and-more