veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
200 stars 68 forks source link

Inferring mutation rate from synonymous substitution rate in 4fold degenerate codons #1646

Closed Vizueta closed 9 months ago

Vizueta commented 9 months ago

Hello,

This might be an odd request unrelated to the common usage of hyphy, but I wanted to confirm with the authors about an analyses that we have done. We have a set of >100 sequenced species, and we estimated the mutation rates using 4fold degenerate sites with r8s, but these estimates are root-to-tip and only representative for the species. We wanted to estimate the rates for each branch in the phylogeny, and therefore we thought about using hyphy for this. The method was the following: We extracted the entire codon containing the four fold degenerate sites and only kept codons that are constant in the first and second position in the codon through the alignments (therefore, the only varying sites are the 3rd positions that are 4fold degenerate and represent neutral sites). We used this concatenated alignment and the species tree in hyphy to estimate the synonymous substitutions per site (and the non-synonymous was 1e-10 constantly in all branches). For this, I used absrel as it provides the dn and ds separately for each branch. I also tried "FitMG94.bf" but it failed with this message: "Error:Failed to dereference 'fitter.filter.default.site_map' Failed to dereference 'fitter.filter.default.site_map'"

Then, we divided the estimated dS for the age of each branch, and multiplied per 3 (as the estimates are per codon) to retrieve neutral substitution per site per Mya (mutation rate as our species are univoltine). We find that the estimated mutation rate is quite variable in the tree, with short branches having very high estimates. The r8s estimates varied from 2.011x10-9 to 3.769x10-9, and with this new analyses we get short branches with ~10x10-9.

I would appreciate your opinion on this analysis and whether you see any reasons not to rely on these estimates.

Thanks in advance!

spond commented 9 months ago

Dear @Vizueta,

If you add ENV="USE_MEMORY_SAVING_DATA_STRUCTURES=1e12;" to the command line for FitMG94.bf you will remove the fitter.filter.default.site_map-related error.

The approach you describe is sensible to me, but I would not remove any data from the alignment, and just estimate synonymous-only rates using FigMG94.bf using the local model option. Ideally, these estimates should be similar to what you get by filtering the data down to 4-fold degenerate sites. I always prefer keeping as much data as possible.

Best, Sergei

Vizueta commented 9 months ago

Dear Sergei,

Thank you very much, it is reassuring that our approach looks good to you. I have been able to run FitMG94.f adding the command you mentioned and the estimates are the same as those in absrel but it runs faster.

All the best, Joel