waldronlab / curatedTCGAData

Curated Data From The Cancer Genome Atlas (TCGA) as MultiAssayExperiment Objects
https://bioconductor.org/packages/curatedTCGAData
41 stars 7 forks source link

Mutations Almost All Missing #47

Closed DarioS closed 3 years ago

DarioS commented 3 years ago

No matter the cancer type, the number of non-missing mutations is always equal to the number of ranges.

library(curatedTCGAData)
headNeck <- curatedTCGAData("HNSC", "Mutation", dry.run = FALSE, version = "2.0.1")
dim(assays(headNeck)[[1]])
    51799   279
table(is.na(assays(headNeck)[[1]]))
   FALSE     TRUE 
   51799 14400122
melanoma <- curatedTCGAData("UVM", "Mutation", dry.run = FALSE, version = "2.0.1")
dim(assays(melanoma)[[1]])
    2174   80
table(is.na(assays(melanoma)[[1]]))
 FALSE   TRUE 
  2174 171746
> assays(melanoma)[[1]][1:5, 1:5]
              TCGA-RZ-AB0B-01A-11D-A39W-08 TCGA-V3-A9ZX-01A-11D-A39W-08 TCGA-V3-A9ZY-01A-11D-A39W-08 TCGA-V4-A9E5-01A-11D-A39W-08 TCGA-V4-A9E7-01A-11D-A39W-08
18:9550172:+  "PPP4R1"                     NA                           NA                           NA                           NA                          
13:79175838:+ "POU4F1"                     NA                           NA                           NA                           NA                          
6:38828378:+  "DNAH8"                      NA                           NA                           NA                           NA                          
19:55086935:+ "LILRA2"                     NA                           NA                           NA                           NA                          
1:11169412:+  "MTOR"                       NA                           NA                           NA                           NA                          

> sessionInfo()
R version 4.1.0 (2021-05-18)

It implies that each and every mutation only occurs in one sample, which is unlikely to be real.

LiNk-NY commented 3 years ago

Hi Dario,

That's not how we measure mutation frequency.

You are working with a RaggedExperiment and I would encourage you to visit the vignette for more information. https://bioconductor.org/packages/release/bioc/vignettes/RaggedExperiment/inst/doc/RaggedExperiment.html

See here for an example of how to find non-silent mutations: https://github.com/Bioconductor/RaggedExperiment/blob/master/inst/scripts/assay-functions-Ex.R

If you have further questions, please create a support.bioconductor.org post.

Thank you!

Best, Marcel