Closed pkfsantos closed 1 month ago
Dear @pkfsantos,
aBSREL
and RELAX
(and most other selection methods) do not make specific assumptions re: orthology v paralogy. Considerations specific to including multiple gene copies per organism would be as follows:
Where you need to think carefully about your hypotheses is how you run these models: how to define branch sets or how to interpret inferences.
Orthogroups are a rather fluid concept:). Can you give me a bit more specifics? For example, could you include a few example trees and elaborate on what evolutionary hypotheses you are interested in testing?
Best, Sergei
Dear Sergei,
Thank you for your response. I can share more details of the analysis.
I am testing the hypothesis that bee species which have lost prepupal diapause show convergent signals of selection. In this context, "lost prepupal diapause" refers to species that either undergo diapause in the adult stage or do not undergo diapause at all. The species that do not experience prepupal diapause are the test branches in my analysis.
Here are the command lines I used:
For aBSREL:
$HOME/hyphy-2.5.55/hyphy LIBPATH=/path/hyphy-2.5.55/res absrel --tree {input.tree} \
--multiple-hits Double+Triple --srv Yes --branches Test --alignment {input.alignment} --output {output}
For RELAX:
$HOME/hyphy-2.5.55/hyphy LIBPATH=/path/hyphy-2.5.55/res relax --tree {input.tree} \
--multiple-hits Double+Triple --srv Yes --test Test --alignment {input.alignment} --models All --output {output}
Below are examples of gene trees I used in these commands. They vary in the number of species included, ranging from 20 to 27. Although there are 27 species in total, I excluded species with paralogous sequences and their respective sequences from the analysis.
Trees:
OG0000758 (20 spp)- (Apis_mellifera{Test},(Bombus_campestris{Test},Bombus_vancouverensis{Test})98:0.1248289060,(((((((Osmia_bicornis:0.3083666780,Colletes_gigas :0.2810296065)71:0.0365119381,Ceratina_calcarata{Test})64:0.0256354274,Dufourea_novaeangliae:0.2744228762)56:0.0212980832,Andrena_dorsata{Te st})29:0.0000021463,((((Augochlora_pura{Test},Megalopta_genalis{Test})94:0.0506933522,((Lasioglossum_leucozonium{Test},Halictus_rubicundus{T est})77:0.0110665378,Lasioglossum_morio{Test})98:0.0677231186)96:0.0619332421,Nomia_melanderi:0.0943288220)97:0.1655076919,(((Tetragonula_ca rbonaria{Test},Frieseomelitta_varia{Test})96:0.1042681291,Eufriesea_mexicana:0.1305487421)67:0.0380758794,Megachile_rotundata:0.4134365993)7 3:0.0318823231)79:0.3747813131)11:0.0000020202,Macropis_europaea:0.5991660057)47:0.0681293209,Peponapis_pruinosa:0.2863838610)96:0.142377509 4);
OG0004459 (27 spp)- (Megachile_rotundata:0.3296167985,((Macropis_europaea:0.6722070970,(((((((Augochlorella_aurata{Test},Augochlora_pura{Test})100:0.0728959509,Megalopta_genalis{Test})100:0.2107037633,((Agapostemon_virescens{Test},Sphecodes_monilicornis{Test})74:0.0082589795,(Halictus_rubicundus{Test},(Lasioglossum_leucozonium{Test},Lasioglossum_morio{Test})100:0.0268164910)100:0.0251728570)100:0.0954600370)100:0.1831784910,Nomia_melanderi:0.4073433383)100:0.1515534240,Dufourea_novaeangliae:0.4419274950)100:0.2545452597,Colletes_gigas:0.4092848196)100:0.0647613480,Andrena_dorsata{Test})98:0.0329326202)100:0.2645936364,((((Peponapis_pruinosa:0.6184577402,Ceratina_calcarata{Test})61:0.0810584608,Tetrapedia_diversipes:0.3697968136)85:0.0339595109,(((Bombus_campestris{Test},Bombus_vancouverensis{Test})100:0.1768048806,(Tetragonula_carbonaria{Test},(Frieseomelitta_varia{Test},Melipona_quadrifasciata{Test})95:0.0218915305)100:0.1833626573)100:0.0920878035,((Euglossa_dilemma{Test},Eufriesea_mexicana:0.0707405197)100:0.2460430339,Apis_mellifera{Test})98:0.0553461104)100:0.2630127122)65:0.0405230248,Habropoda_laboriosa:0.3532001790)100:0.1712057359)100:0.3089256852,Osmia_bicornis:0.2965500768);
OG0003524 (27 spp) (Lasioglossum_morio{Test},((Sphecodes_monilicornis{Test},(((((((((Tetrapedia_diversipes:0.2688643652,Ceratina_calcarata{Test})76:0.0806217503,((((((((Osmia_bicornis:0.3547675633,Colletes_gigas:0.4312002746)87:0.1734586725,Peponapis_pruinosa:0.3207922286)88:0.0656998797,Apis_mellifera{Test})92:0.0727635516,Bombus_vancouverensis{Test})96:0.1695106008,Frieseomelitta_varia{Test})94:0.1657699678,(Tetragonula_carbonaria{Test},Melipona_quadrifasciata{Test})42:0.0000029268)94:0.1238911943,Bombus_campestris{Test})96:0.1035711120,(Eufriesea_mexicana:0.1301948644,Euglossa_dilemma{Test})100:0.2319604865)71:0.0670826138)79:0.0409503949,Habropoda_laboriosa:0.3169983723)95:0.1079844275,Megachile_rotundata:0.3937959160)67:0.0518535143,(Andrena_dorsata{Test},Macropis_europaea:0.3302231674)86:0.0872625623)98:0.1231582917,Dufourea_novaeangliae:0.2 958274586)87:0.0543614345,Nomia_melanderi:0.2353898934)100:0.1835103701,((Augochlora_pura{Test},Megalopta_genalis{Test})94:0.0126039313,Augochlorella_aurata{Test})100:0.1650692566)100:0.1037560067,Agapostemon_virescens{Test})75:0.0151908692)79:0.0237630570,Halictus_rubicundus{Test})97:0.0147355006,Lasioglossum_leucozonium{Test});
OG0000697 (22 spp) (Habropoda_laboriosa:0.3333781780,((((((((Sphecodes_monilicornis{Test},(Halictus_rubicundus{Test},Lasioglossum_leucozonium{Test})100:0.0472830663)99:0.0113294227,Agapostemon_virescens{Test})100:0.1115246061,((Augochlora_pura{Test},Augochlorella_aurata{Test})100:0.0347544573,Megalopta_genalis{Test})100:0.2136747060)100:0.2727579051,Nomia_melanderi:0.3297610118)100:0.3403229647,Colletes_gigas:0.4621638136)88:0.0451388019,Andrena_dorsata{Test})93:0.1712115457,(Osmia_bicornis:0.2575983879,Megachile_rotundata:0.3065534547)100:0.3238268730)92:0.1314671553,Ceratina_calcarata{Test})91:0.0466296593,(((Apis_mellifera{Test},Eufriesea_mexicana:0.3699447303)74:0.0255258128,((Frieseomelitta_varia{Test},Tetragonula_carbonaria{Test})100:0.1341296737,(Bombus_campestris{Test},Bombus_vancouverensis{Test})100:0.2598948297)100:0.0931200954)98:0.0630305504,(Peponapis_pruinosa:0.4016097663,Tetrapedia_diversipes:0.3560483073)100:0.0651485715)96:0.0511089335);
Best,
Priscila
Dear Priscila,
We have a specialized analysis, BUSTED-PH (and protein based RER), designed to test for convergent evolution. Take a look at https://github.com/veg/hyphy-analyses/tree/master/BUSTED-PH and https://github.com/veg/hyphy-analyses/tree/master/RER
If you end up including paralogs, put them all in the {test} or {background} group as appropriate. You could also exclude species with paralogs as well as you did in the examples.
It looks like you should have plenty of power to test for this, given that you have a good number of branches in the test and reference sets.
I am tagging a graduate student in our group, @agselberg, who's been working on these types of analyses. She should be able to help you out (I am going to be AFK for ~2 weeks).
Best, Sergei
Thank you, Sergei.
@agselberg, just to continue the conversation: I used aBSREL instead of BUSTED because I was interested in identifying which specific branches were showing signals of positive selection. In the end, only a few branches (a maximum of 3 out of 27 species) showed convergent signals of positive selection.
I am also combining the results from aBSREL and RELAX with RERconverge, which I believe is a similar approach to what was suggested in the last message.
My main concern (which was actually raised by a reviewer) was about the potential issue of violating HyPhy’s assumptions when excluding species with paralogs, but I now understand that this is not the case.
Please feel free to share any additional comments on the approaches used.
Thank you, Priscila
Priscila,
I agree with Sergei about the paralogs, you should be fine because no specific assumptions are made.
I wanted to emphasize- if you are testing for signals of positive selection in specific, single branches aBSREL should be used. But if you are testing if a gene is showing convergent signals of positive selection (similar to RELAX/RER methods), BUSTED-PH is recommended.
Either (or both) methods are fine to run but they will impact how you interpret your results.
Best, Avery.
Perfect! Thank you Avery for the further clarification. Priscila
Stale issue message
I used orthogroups identified in OrthoFinder as input for HyPhy analyses. I am working with 27 species, and there are 2,592 orthogroups identified as single-copy in all species. To include more orthogroups in the HyPhy analysis, I included those that were single-copy in at least 70% of the species and excluded the species with paralogous sequences.
Does that violate HyPhy assumptions of single-copy orthology? I have read in another post that aBSREL can handle paralogous sequences, so I understand that including or not including the paralogous sequences does not violate the assumptions. What about RELAX?
I ask from an evolutionary perspective. If some orthogroups are evolving faster in certain species that have experienced gene duplication and loss, could these orthogroups be considered as faster evolving overall? Even in species that do not currently have duplications, could this affect the analysis or violate HyPhy's assumptions of single-copy orthology?
I appreciate any comments on that.
Priscila