im3sanger / dndscv

dN/dS methods to quantify selection in cancer and somatic evolution
GNU General Public License v3.0
212 stars 48 forks source link

dndscv using MC3 maf file from TCGA #38

Closed KCLiv closed 5 years ago

KCLiv commented 5 years ago

Dear Inigo,

I have been trying to use dndscv with mc3 maf file provided by TCGA (https://gdc.cancer.gov/about-data/publications/mc3-2017).

I have selected mutation data from samples in low-grade glioma cohort (LGG) and subjected this data to dndscv. However, the result I got seem wrong with > 6,000 significant genes and globaldnds are as shown below.

 name       mle    cilow   cihigh

wmis wmis 824.7381 412.3385 1649.599 wnon wnon 1082.5946 539.5835 2172.066 wspl wspl 624.3976 309.7443 1258.691 wtru wtru 900.7350 449.3966 1805.362 wall wall 834.0723 417.0132 1668.237

Here is my script, LGG_mc3.maf is the subset of mc3 maf file from TCGA containing only samples from LGG.

library(dndscv)
library(maftools)
lgg <- read.maf(maf = "LGG_mc3.maf")
lgg_df <- lgg@data
lgg_df <- lgg_df[,c("Patient_ID", "Chromosome", "Start_Position", "Reference_Allele", "Tumor_Seq_Allele2")]
colnames(lgg_df) <- c("sampleID", "chr", "pos", "ref", "mut")
lgg_dndsout = dndscv(lgg_df)
lgg_sel_cv = lgg_dndsout$sel_cv
print(head(lgg_sel_cv), digits = 3)
lgg_signif_genes = lgg_sel_cv[lgg_sel_cv$qglobal_cv<0.05, c("gene_name","qglobal_cv")]
rownames(lgg_signif_genes) = NULL
print(lgg_signif_genes)
lgg_dndsout$globaldnds

Thank you

im3sanger commented 5 years ago

Hello,

That is a very odd result and it suggests that the input mutation file is not correct. You say that you have "selected mutation data". Can you confirm that you have not filtered out synonymous mutations before running dndscv?

Best, Inigo

KCLiv commented 5 years ago

Thank for your prompt response. I will check again. Cheers