im3sanger / dndscv

dN/dS methods to quantify selection in cancer and somatic evolution
GNU General Public License v3.0
212 stars 48 forks source link

Is there a way to change the omega(w) values #87

Closed ykginoue closed 1 year ago

ykginoue commented 1 year ago

Hi,

Thank you very much for this wonderful program. Many members of our lab use this program frequently.

I am working on finding positively selected genes in POLE mutated cancer. As mentioned in your original article, dnds values of genes in POLE mutated cancers are high, even in likely passenger genes. I understand that this phenomenon will lead to over-representation of positively selected genes.

To overcome such difficulties, we thought that changing the threshold of omega values of missense/nonsense/splicing from 1 to a higher value, which are global dnds values calculated from "likely passenger genes", can filter out false positive genes.

 name       mle    cilow   cihigh

wmis wmis 1.1204080 1.001663 1.253230 wnon wnon 2.0707876 1.725869 2.484638 wspl wspl 0.8894206 0.588713 1.343726 wtru wtru 1.8082777 1.519062 2.152558 wall wall 1.1793914 1.055758 1.317502

The above is the global dnds values from our preliminary data of likely passenger genes.

Looking into your R code of 'dndscv.R', we though that lines 475-524 correspond to the calculation of p-values of each mutation type. In line 488,

ll0 = sum(dpois(x=x$N, lambda=x$Lmutratesmrfold*t(array(c(1,1,1,1),dim=c(4,numrates))), log=T)) + dgamma(opt_t, shape=shape, scale=scale, log=T) # loglik null model

does array(c(1,1,1,1) correspond to the threshold of omega values of missense/nonsense/splicing? Does changing the values from c(1,1,1,1) to c(1,1.12,2.07,0.88) mean that the threshold is changed?

Thank you very much in advance

Best wishes, Yoshikage

im3sanger commented 1 year ago

Hi Yoshikage,

Thank you for your interest in dndscv and your kind words.

I have never tried to modify the neutral reference value using the current implementation of dndscv. Theoretically, it should be possible to use a different neutral value. I think that your intuition is right and that line 488 is a key one. However, I think that doing this properly will need more changes than the ones you mention. At the very least, you will need to change lines 488, 495 and 502. But I suspect that other changes will also be necesary to get correct estimates (and you may also want to correct the "w" or dN/dS estimates per gene).

However, I would not recommend trying to make these changes to look for drivers in POLE hypermutator tumours. This is because the global dN/dS values that you will be using as a neutral reference are averages across the entire exome. For a given gene, the neutral value could be very different depending on the exact sequence of the gene. As you know, the POLE signature (SBS10) is dominated by a few peaks with a strong extended sequence context. Each of these peaks will have different biases in dN/dS (you can check this by running dndscv on your POLE hypermutator tumours using only C>A or only C>T mutations and looking at dndsout$globaldnds). So the expected neutral background rate of each gene could be very different from the exome mean. Using the exome mean as the neutral reference may reduce false positives to some extent, but I would still expect this method to yield many false positives as the background mutation model for each gene is not correctly calibrated...

Best wishes, Inigo

ykginoue commented 1 year ago

Dear Inigo

Thank you very much for the answer.
Sorry in advance, if my English is awkward.

The below statement is how I understood your statement. Even if the neutral reference value is changed from 1 to a higher value (with additional correction of dN/dS estimates per gene), the variation of neutral background rates of genes is too large. So, even if the dnds value of a gene is significantly higher than the modified neutral reference value (obtained from exome data), it is suspicious that the gene is "positively selected", because the neutral background rate of the gene may be higher than the modified neutral reference value. In addition, neutral background rate of a gene is very difficult to estimate because the sequence context of mutations in POLE hypermutator tumors is biased beyond the trinucleotide context. Is this correct?

All the best, Yoshikage

im3sanger commented 1 year ago

Hi Yoshikage,

Your interpretation is correct, yes. Making the change that you suggested could help reduce the number of false positives, but I would still expect false positives because, in reality, the neutral reference value will vary very significantly across genes. So the model remains poorly calibrated. An additional problem is that POLE tumours are hypermutators, so you will probably have large numbers of mutations. Large datasets with a poorly calibrated model can lead to many false positives. Smaller dataset may be less sensitive to these problems, however.

I hope that this makes sense...

Best, Inigo

ykginoue commented 1 year ago

Hi Inigo,

It all makes sense! Thank you for all the explanations.

Best wishes, Yoshikage

im3sanger commented 1 year ago

Thanks Yoshikage. I am sorry I could not be more useful this time!

Best, Inigo