Closed vivekruhela closed 2 years ago
Hi Vivek,
Thank you for your interest in dNdScv.
Could you tell me more about this dataset? Looking at the top of the sel_cv table, the numbers of non-synonymous and synonymous mutations per gene seem very odd. dNdScv expects the data to derive from unbiased targeted, exome or whole-genome sequencing.
Best wishes, Inigo
Hi,
Sorry for the late response. I am using WES (whole exome sequencing dataset) obtained from dbGAP, EGA and AIIMS. There are 1163 samples in which I am trying to identify significantly altered genes. The mutations were identified using four variant callers (MuSE, Mutect2, Somatic-Sniper, and Varscan2). Before giving mutations to dNdScv, I filtered out the benign SNVs using the Fathmm-XF algorithm. Thanks.
Hi Vivek,
Thank you. Based on your answer, one of the problems with your data is the filtering of benign SNVs. dNdScv expects full datasets of somatic mutations without pre-filtering synonymous or benign mutations. Also, can you confirm whether all of your datasets are of somatic mutations, or have you included germline datasets?
Best, Inigo
Sorry for late response. I have downloaded MMRF dataset from GDC portal. and it is of somatic mutations only.
Hi,
I tried dndscv to identify the significantly mutated genes The dataset is huge and I have a list of around 4818890 mutations. I have given the following commands to get significant genes:
out = dndscv(mut1,max_muts_per_gene_per_sample = 700, max_coding_muts_per_sample = 70000)
The screenshot of p and q-values of the genes is as follows:
Here we can see that all the qglobal, qallsubs and pallsubs are zero. I am not sure how to take top significantly mutated genes. Kindly suggest.
EDIT-1: I am sorry, I post it without checking the results properly. I found that out of 20091 genes (obtained from the command
sel_cv <- out$sel_cv)
), there are 17891 genes with qglobal_cv value less than 0.05 while 18700 genes with qglobal_cv value less then 0.1. Still 17891 genes are too many. Can you suggest how to get significant genes? Is it ok to use qglobal_cv here?