Can you explain mrfold?

Hello,

Thank you for your interest in dndscv.

mutrates is a vector with the estimated substitution rates per available site for each of the 192 possible trinucleotide changes. They are fixed averages across genes. However, when calculating dN/dS ratios for a given gene, we need to adjust these rates according to the estimated background mutation rate of the gene, which is done using the "t" parameter described in the Suppl material of the paper.

In the dNdSloc model, "t" is calculated based on the observed number of "neutral" mutations (i.e. synonymous mutations when using the unconstrained model, or synonymous mutations and those mutation types set to w==1 in the constrained models). In the dNdScv model, t_opt takes into account both the observed number of "neutral" mutations in the gene (Poisson observations) and the local covariates (Gamma function from the negative binomial regression), as described in the Suppl material of the paper. The mrfold factor is simply a way to calculate the "t" parameter for each gene under both models. "mrfold * t" in the dndscv code is equivalent to the description of the "t" parameters in the Suppl material of the paper.

I realise that this is not easy to explain here, but I hope this makes some sense.

Best, Inigo

im3sanger / dndscv

Can you explain mrfold? #53