Open kvn95ss opened 3 years ago
Hey, have you been able to find an answer for your question? I am running into the exact same problem and getting no hit. Thanks
Hello,
Sorry for the very late reply.
Yes, this means that there are no recurrently mutated genes in your dataset reaching statistical significance. Can you explain your experimental design in more detail? From your earlier description it sounds like you are analysing data from a single patient. Is that correct? In that case it would not be unexpected not to find any significant recurrence, as this relies on finding mutations in the same gene across multiple samples or patients.
Inigo
Hi Inigo, thanks for your reply. I am indeed working on 27 WES sarcoma tumours. They are multi regional and for each tumour I have 3-6 regions sampled and sequenced which I am merging them into one for each tumour by removing duplicate mutations. I was expecting to find at least a few hits as sarcomas are not normally SSMs type of tumours but I am getting all q-values equal to one, nothing significant.
Hello,
Thank you. Apologies, I had not realised that there were questions from separate users.
Can you confirm what value of theta you are getting? (dndsout$nbreg$theta).
Lack of significance can be caused by datasets that are too small or that do not have sufficient recurrence for any gene to reach significance. However, it is always important to check that your theta value is not very low (<<1). Very low theta values mean that there is very high variation in the density of synonymous mutations across genes. This typically reflects problems with the mutation calls, such as recurrent artefacts or SNP contamination. Large variation in the density of mutations across genes (high overdispersion) makes dNdScv be more conservative (a gene needs to have more mutations to emerge from the noise) and results in less significance.
If your dataset has good theta values (>1, or ideally >3) and your mutation calls are reliable, then the lack of significance may reflect insufficient power (small datasets or insufficient recurrence).
Best, Inigo
Hello,
Thank you. Apologies, I had not realised that there were questions from separate users.
Can you confirm what value of theta you are getting? (dndsout$nbreg$theta).
Lack of significance can be caused by datasets that are too small or that do not have sufficient recurrence for any gene to reach significance. However, it is always important to check that your theta value is not very low (<<1). Very low theta values mean that there is very high variation in the density of synonymous mutations across genes. This typically reflects problems with the mutation calls, such as recurrent artefacts or SNP contamination. Large variation in the density of mutations across genes (high overdispersion) makes dNdScv be more conservative (a gene needs to have more mutations to emerge from the noise) and results in less significance.
If your dataset has good theta values (>1, or ideally >3) and your mutation calls are reliable, then the lack of significance may reflect insufficient power (small datasets or insufficient recurrence).
Best, Inigo
Hello,
I encountered the same issue. My samples come from multiple tissue sites of several patients. I used MuTect2 to obtain a set of somatic variants. However, when I used dNdScv to look for driver genes, the qglobal_cv for all genes is close to 1. I noticed that the result shows θ=3.881757. I find this quite confusing.
Hello ym-chen,
Thanks for your message. Could you clarify how many samples you are analysing here? Lack of significance does not necessarily mean that there are not drivers in your dataset but that there is not enough evidence to reach statistical significance. This could be due to insufficient power if your dataset is too small.
Best, Inigo
Hello,
I ran this data set on filtered output from Mutect2 (tumor vs normal, single patient with PoN of 4 samples). I got the mutation list by querying the vcf file from bcftools so I get the columns sampleID, chr, pos, ref and mut.
I'm using hg38 reference from the precomputed rdna file in this repo - https://github.com/im3sanger/dndscv_data/tree/master/data
I'm able to get dndscv running for my data by using these commands
cancer_test <- read.table("CC028_dmg_test.vcf")
cancer_processed_data = dndscv(cancer_test, ref_db="data/RefCDS_human_GRCh38.p12.rda", cv=NULL)
sel_cv = cancer_processed_data$sel_cv;print(head(sel_cv), digits = 3)
I get this output -But when looking for significant genes, I get no output
print(cancer_processed_data$sel_cv[cancer_processed_data$sel_cv$qglobal_cv<0.1, c("gene_name","qglobal_cv")])
What could be the reason for this? Does this imply there are no significant genes in the data?