what is difference between selecting one ancestral branch and selecting ancestral branch and all descendent branches

jzhan6 commented 2 years ago

Hi,

Thank you for making all great methods in one package in the hyphy. It is an amazing tool for evolutionary analysis. I am a newbie to the evolutionary analyses and have a question about branch selection for BUSTED. What is difference between selecting 1) one ancestral branch , and 2) selecting ancestral branch and all descendent branches? Example is showing below, selected branch(es) is highlighted in blue

only ancestral branch
ancestral + descendent
If I want to know the positive selection for lineage including samples from NSAM to LOOSEJAW, should I use only ancestral branch as test branch (image 1) or should I use ancestral + descendent branches as test (image 2)? What is different meaning between these two selections in BUSTED analysis?

spond commented 2 years ago

Dear @jzhan6,

The two scenarios you describe are quite distinct. The top one (single branch) only looks for positive selection on that ONE branch. The bottom one looks for positive selection anywhere in the entire clade. Based on what you describe as you analytical goal, you should select the entire clade (image 2).

Best, Sergei

jzhan6 commented 2 years ago

Hi Sergei,

Thank you so much for your prompt reply. I think I may not make my goal clear. In fact, I want to find the genes that are positive selected common to all samples in the clade including NSAM to LOOSEJAW. For example, there is one new specific phenotype common to all samples in that clade. So I suspect some positive selection may happen in that ancestral branch leading to the new phenotype common to all descendants. I want to find genes possibly related to occurrence of this phenotype. According to your explanation, I feel like selecting only ancestral branch looks for positive selection only in that one branch which may help find positively selected genes leading to the new phenotype common to all descendants. The bottom one looks for positive selection anywhere in the entire clade. So it means I may find positively selected gene only in one of branches (eg, only in tip branch leading to LOOSEJAW) instead of positive selected genes leading to common phenotypes of all samples in that clade?

So to find positive selections that may contribute to a new phenotype common to a whole clade (newly occurred in that clade) , I should choose only ancestral branch. Am I correct?

spond commented 2 years ago

Dear @jzhan6,

Thanks for the clarification. This is a fairly common situation, where you are trying to see if a change in some phenotype/feature is associated with a selection on a gene/genes. We even have a special method for these types of settings: https://github.com/veg/hyphy-analyses/tree/master/BUSTED-PH

If you have a single evolutionary event (one emergence of a trait, and then a clade where all the species have it), you would indeed label the ancestral branch. You could also test the entire clade, if you suspect that additional evolution of the trait took place. Further, if you suspect that the trait emerged at the MRCA of the clade and then was maintained, you could test the entire clade (minus the MRCA branch) for evidence of NEGATIVE selection (https://github.com/veg/hyphy-analyses/tree/master/BUSTED-conservation) in addition to testing the ancestral branch for positive selection.

If the same trait emerged multiple times, you could use a parsimony or conjunctive (all descendants) labeling (see https://github.com/veg/hyphy-analyses/tree/master/LabelTrees) to distinguish branches with and without phenotype.

Best, Sergei

jzhan6 commented 2 years ago

Thanks for your detail explanation. It is great news that I also can test negative selection. Overlap between positive selection and negative selection will a great indicator for true positives.

veg / hyphy

what is difference between selecting one ancestral branch and selecting ancestral branch and all descendent branches #1496