veg / hyphy-analyses

HyPhy standalone analyses
MIT License
37 stars 17 forks source link

What tree to use for FitMG94? #37

Closed liamfriar closed 1 year ago

liamfriar commented 1 year ago

Hi, thank you for building and maintaining this tool!

Question: If I am running FitMG94 on an orthogroup across 50 species, should I input the species tree or the gene tree along with the sequence alignment?

Background: I am trying to determine purifying selection vs. drift vs. positive selection of a couple dozen genes of interest in a phylogeny of ~50 species. As a preliminary analysis, I want to determine dN, dS, and dN/dS for those genes between species pairs or along branches of the phylogeny. I think FitMG94 is the tool for this.

spond commented 1 year ago

Dear @liamfriar,

Take a look at https://github.com/veg/hyphy/issues/1579 for a general recommendation on GT vs ST (GT).

Generally, for gene-level selection detection, we recommend BUSTED (e.g. see https://www.biorxiv.org/content/10.1101/2022.12.02.518889v3 for the most recent discussion)

For branch-level detection of selection, we recommend aBSREL.

Both are standard analyses in HyPhy (hyphy busted ... hyphy absrel ...).

You could definitely use FigMG94 to obtain dN and dS estimates (and for lineages as well with the local option), but this will be done with no site-to-site rate variation and other known confounders. Unless your alignments are very low divergence or are very small for specific genes, this is suboptimal.

Were you following a specific published analysis?

Best, Sergei

liamfriar commented 1 year ago

Hi Sergei,

Thank you for such a quick and helpful response!

Short answer: I was not following a specific published analysis. I am planning to read through the hyphy publications, but I am more familiar with dN/dS and selection analyses that use PAML and related packages. I would LOVE to know if there are certain papers you think might be helpful, as some of the terminology (i.e. gene-level vs. branch level) is still confusing to me.

Longer answer: I was using FigMG94 because it seemed the most straightforward way to get dN and dS values before trying to understand what seem like more complicated analyses such as BUSTED....It looked to me like BUSTED would be most appropriate for positive selection tests and RELAX for drift, but I was admittedly not clear on what aBSRREL, FEL, and FUBAR do...I will have to read the publications!

spond commented 1 year ago

Dear @liamfriar,

This might be the best starting point from a practical perspective to get you oriented. Happy to answer more specific questions as well.

You can definitely use FitMG94 as an exploratory tool as well. There are many ways to skin the proverbial cat. One issue with dN and dS, especially if you use a tree to estimate them and then compute dN and dS between pairs of species by tracing paths across the tree, for example, is that these estimates will be non-trivially correlated due to shared ancestry. Depends, of course, on how you want to interpret the results.

Most of PAML methods will also compute dN/dS from site- or branch-site level models and then do some testing.

Best, Sergei

liamfriar commented 1 year ago

Thank you so much, @spond . These resources really helped. I do have a couple of other questions, but will post them separately or on appropriate existing threads so that they are more helpful to other people.