im3sanger / dndscv

dN/dS methods to quantify selection in cancer and somatic evolution
GNU General Public License v3.0
212 stars 48 forks source link

All the values of qglobal_cv are zero #68

Closed vivekruhela closed 2 years ago

vivekruhela commented 3 years ago

Hi,

I tried dndscv to identify the significantly mutated genes The dataset is huge and I have a list of around 4818890 mutations. I have given the following commands to get significant genes: out = dndscv(mut1,max_muts_per_gene_per_sample = 700, max_coding_muts_per_sample = 70000)

The screenshot of p and q-values of the genes is as follows: Screenshot_2021-07-22_19-33-33

Here we can see that all the qglobal, qallsubs and pallsubs are zero. I am not sure how to take top significantly mutated genes. Kindly suggest.

EDIT-1: I am sorry, I post it without checking the results properly. I found that out of 20091 genes (obtained from the command sel_cv <- out$sel_cv)), there are 17891 genes with qglobal_cv value less than 0.05 while 18700 genes with qglobal_cv value less then 0.1. Still 17891 genes are too many. Can you suggest how to get significant genes? Is it ok to use qglobal_cv here?

im3sanger commented 2 years ago

Hi Vivek,

Thank you for your interest in dNdScv.

Could you tell me more about this dataset? Looking at the top of the sel_cv table, the numbers of non-synonymous and synonymous mutations per gene seem very odd. dNdScv expects the data to derive from unbiased targeted, exome or whole-genome sequencing.

Best wishes, Inigo

vivekruhela commented 1 year ago

Hi,

Sorry for the late response. I am using WES (whole exome sequencing dataset) obtained from dbGAP, EGA and AIIMS. There are 1163 samples in which I am trying to identify significantly altered genes. The mutations were identified using four variant callers (MuSE, Mutect2, Somatic-Sniper, and Varscan2). Before giving mutations to dNdScv, I filtered out the benign SNVs using the Fathmm-XF algorithm. Thanks.

im3sanger commented 1 year ago

Hi Vivek,

Thank you. Based on your answer, one of the problems with your data is the filtering of benign SNVs. dNdScv expects full datasets of somatic mutations without pre-filtering synonymous or benign mutations. Also, can you confirm whether all of your datasets are of somatic mutations, or have you included germline datasets?

Best, Inigo

vivekruhela commented 1 year ago

Sorry for late response. I have downloaded MMRF dataset from GDC portal. and it is of somatic mutations only.