veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
205 stars 69 forks source link

Calculate dN and dS from BUSTED #554

Closed carloscongrains closed 7 years ago

carloscongrains commented 7 years ago

Hi all,

I run BUSTED across four-thousands clusters of ortholog coding sequences from individuals from a Neotropical fruit fly genus. In the phylogenies, I found moderate to high levels of incomplete lineage sorting (ILS), which may have been produced by introgression. I want to test if there is a relationship between evolutionary rates and introgression level. So, I would like to ask you two questions:

  1. BUSTED produced three weighted values of Omega, but I would like to calculate dN and dS rates independently. I have already read the post https://github.com/veg/hyphy/issues/509, but I only found the script for aBSREL. Besides, I couldn’t find the .fit file. In my run, the output of BUSTED consisted in five files: three main files (.BUSTED.json, busted.LF.bf and .null.bf) and two log files (messages and error).
  2. Since I found ILS in the phylogeny, I used the gene tree to test for positive selection. I would like to know your opinion about the impact of incomplete lineage sorting and/or introgression in this analysis. Do you think there is something that I can do to reduce this effect?

I would appreciate any help.

Best,

Carlos

sjspielman commented 7 years ago

Hi Carlos,

  1. BUSTED in fact does not infer dN and dS separately but instead only infers omega distributions across foreground vs. background lineages. In other words, dS is implicitly assumed to be 1. I would suspect that what you're referring to in post #509 was a typo and/or referred to an earlier version of BUSTED. @spond may have more information on this point. That said, I would caution against the specific point estimates of w, dN, and/or dS for a branch-wide method like BUSTED as these values will not precisely represent the actual evolutionary rates.

  2. In cases of ILS, you will really only be able to test for selection w/ codon models on genes which have fully diverged (and yes, definitely use gene trees!). Codon models like those in HyPhy are known to give misleading results when differences across sequences are not fixed. You might find population genetics-level methods which account for polymorphisms more useful. Alternatively, if you have a sense of which gene(s) have indeed completely diverged/speciated, you can just go ahead and test these genes directly without a concern.

Best, Stephanie

carloscongrains commented 7 years ago

Hi Stephanie, Thank you I really appreciate your help and comments. Best, Carlos

spond commented 7 years ago

Dear @carloscongrains,

You can always extract dS and dN from fitted trees under any codon models, but unless you want to use it for something like tree scaling (divergence time, etc), it is always better to use the model-defined dN/dS ratios for selection testing, instead of post hoc processing of dN and dS.

Best, Sergei

carloscongrains commented 7 years ago

Dear Sergei,

Thank you for your answer. It clarified my doubt.

Best, Carlos