veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
200 stars 68 forks source link

Interpreting results of aBSREL for within & between species sequences #1644

Closed Emilyaoc closed 7 months ago

Emilyaoc commented 10 months ago

Hello,

I have a question about the use of aBSREL on datasets that contain sequence data from both between and within species (vertebrate species). I feel comfortable with using aBSREL on datasets containing one sequence per species, but I am unsure if it's ok to use aBSREL when there is more than one sequence per species, either (A) because of gene duplication or (B) because of sampling multiple individuals per species (in a single ortholog scenario). In the case of B the non-synonymous differences could be a result of non-fixed mutations as opposed to substitutions, so then I wonder if this violates some underlying model assumptions of aBSREL? In the case of B, the non-synonymous differences are more likely substitutions & the gene copy is orthologous, but is it a problem to have multiple different types of orthologous relationships muddled into the same analysis? If you have a dataset containing multiple sequences per species & are primarily interested in differences in selection on the gene across species, would it be better to select one representative sequence per species? Or better to run aBSREL on a dataset with every sequence included?

Thank you for any thoughts you have time to offer that may help me understand this a bit better

Emily

spond commented 10 months ago

Dear @Emilyaoc,

The ortholog / paralog issue is problematic because you can bring together sequences that did not evolve through point mutation alone. I would endeavor to exclude paralogs, unless you have a good reason to do otherwise.

With multiple individuals per species, I would include all of them but only focus on branches that are between species. You will get a bit more power and resolution that way. Interpreting dN/dS within species is not trivial, as you point out. You can definitely estimate it, and even compare it to expectations (e.g. you would expect intra-species branches to have higher dN/dS estimates in general).

Best, Sergei

github-actions[bot] commented 8 months ago

Stale issue message