Closed njbernstein closed 4 years ago
Hello,
Thank you for your interest in dndscv.
mutrates
is a vector with the estimated substitution rates per available site for each of the 192 possible trinucleotide changes. They are fixed averages across genes. However, when calculating dN/dS ratios for a given gene, we need to adjust these rates according to the estimated background mutation rate of the gene, which is done using the "t" parameter described in the Suppl material of the paper.
In the dNdSloc model, "t" is calculated based on the observed number of "neutral" mutations (i.e. synonymous mutations when using the unconstrained model, or synonymous mutations and those mutation types set to w==1 in the constrained models). In the dNdScv model, t_opt takes into account both the observed number of "neutral" mutations in the gene (Poisson observations) and the local covariates (Gamma function from the negative binomial regression), as described in the Suppl material of the paper. The mrfold
factor is simply a way to calculate the "t" parameter for each gene under both models. "mrfold * t" in the dndscv code is equivalent to the description of the "t" parameters in the Suppl material of the paper.
I realise that this is not easy to explain here, but I hope this makes some sense.
Best, Inigo
Hi there,
I was reading your paper and code, and don't quite understand the following bit:
This
mrfold
correction I don't think is explained in the paper.Why does
mutrates
need to be corrected?t
ismutrates
correct? And why ismrfold
different for each hypothesis? Is this why you need to testw_syn == 1
for each hypothesis? Otherwise, it seems redundant.