veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
205 stars 69 forks source link

aBSREL: acceptable distribution of foreground branches #1542

Closed Emilyaoc closed 1 year ago

Emilyaoc commented 1 year ago

Hello! In the scenario that you have an a priori prediction that selection will vary according to a binary trait, but the trait has occured very many time independantly across the tree, is it ok to have the foreground specified across multiple disparate positions across the tree when running aBSREL? I am pasting a tree file below to demonstrate what I mean, where the foreground branches as specified {CB} and the background {PB}.

Similarly, is this the same for RELAX when there is a test (CB) and refernce (PB) category? Is it ok for the distrubution of the Test braches to be similarly patchy?

((((((((((Spp27{PB}:0.1061445121,(Spp2{CB}:0.0270584277,Spp3{CB}:0.0261794982):0.0344910033):0.0035020037,(((Spp5{CB}:0.0035841027,Spp30{PB}:0.0057611215):0.0219621149,(Spp24{PB}:0.0321787049,Spp1{CB}:0.0178073773):0.0100827858):0.0352484102,((Spp4{CB}:0.0152403061,Spp28{PB}:0.0155103987):0.0038109277,Spp29{PB}:0.0079769122):0.0779856776):0.0087964965):0.001588438,(Spp26{PB}:0.0286713022,Spp25{PB}:0.0527021885):0.1245980631):0.0184857182,(Spp6{CB}:0.028444044,Spp31{PB}:0.0225514116):0.0596485678):0.0296406922,(((Spp40{PB}:0.1201390145,(Spp39{PB}:0.0548268922,(Spp16{CB}:0.0361122117,Spp17{CB}:0.0358397551):0.0246823189):0.0136822679):0.0397902583,((Spp34{PB}:0.0426910453,(Spp12{CB}:0.0185448918,Spp11{CB}:0.0237418132):0.0244439469):0.090134941,((Spp45{PB}:0.0045815149,Spp21{CB}:0.0046461625):0.0337996368,Spp46{PB}:0.0312817224):0.0816084984):0.0090666726):0.0057031342,((Spp19{CB}:0.0468786749,Spp43{PB}:0.0382526367):0.1378100849,(((Spp14{CB}:0.0111939159,Spp36{PB}:0.0137225212):0.0591854198,Spp37{PB}:0.0738589427):0.0666284331,(Spp35{PB}:0.0104790193,Spp13{CB}:0.0095368591):0.0733485724):0.0576597698):0.0074808915):0.0420763111):0.0090785591,Spp33{PB}:0.1539224942):0.0084542564,(Spp9{CB}:0.0176071397,Spp10{CB}:0.0141676561):0.1158838754):0.0150910971,((Spp47{PB}:0.0366681243,Spp22{CB}:0.0386648034):0.063511177,(Spp32{PB}:0.0189333796,(Spp7{CB}:0.0142000867,Spp8{CB}:0.0107546269):0.0084506914):0.0844803839):0.0411672233):0.2360367563,(((Spp20{CB}:0.0025737016,Spp44{PB}:0.0028672275):0.2623635034,(Spp23{CB}:0.0270701033,Spp48{PB}:0.0196880331):0.2388883853):0.0568027823,((Spp41{PB}:0.0124954283,Spp18{CB}:0.0171455802):0.0306188918,Spp42{PB}:0.0605509651):0.1576290146):0.0447594415):0.6098442627,Spp38{PB}:0.1317968626,Spp15{CB}:0.3308590102)

spond commented 1 year ago

Dear @Emilyaoc,

There really is no foreground per se in aBSREL. You may designate any subset of branches you wish to test for evidence of selection (it can be any type of partition), but aBSREL will still test each branch independently of other branches.

I think BUSTED-PH is a more appropriate test for your setting, because it explicitly tests for differences in selection.

Patchy distributions are perfectly fine for RELAX. One thing you may want to consider is labeling internal branches as well, using one of the approaches described in https://github.com/veg/hyphy-analyses/tree/master/LabelTrees

Best, Sergei

Emilyaoc commented 1 year ago

Hi Sergei, Thank you for your speedy reply. I understood from reading the info on aBSREL methods on the HyPhy homepage (https://stevenweaver.github.io/hyphy-site/methods/selection-methods/#absrel) that you can test an a priori hypothesis that particular branches are under selection by specifying foreground branches. But from your answer it sounds like aBSREL is really more suited to an exploratory approach where all branches are tested? And that BUSTED-PH should be used if you have a clear prediction that a certain group of branches (at certain sites) will be subject to stronger positive selection than others. Have I understood correctly? Emily

spond commented 1 year ago

Dear @Emilyaoc,

I think you are generally correct. If you have predefined sets of branches (based on information external to sequences themselves, e.g. trait or phenotype), then

1). You can use BUSTED-PH to test whether or not there is positive selection associated with the 'Foreground' (e.g. trait present) branches, and whether or not selection regimes are different between Foreground and Background.

2). You can use Contrast-FEL to see which (if any) individual sites have different selective regimes between background and foreground.

3). You can use RELAX to test if selection is relaxed (or intensified) between two sets of branches (this is a relative test, i.e. there's no explicit test for positive selection).

aBSREL always tests individual branches. By specifying a subset of branches to test, you reduce the multiple testing correction in aBSREL and boost power. But aBSREL does not combine signal from all the branches in the foreground, whereas BUSTED-PH, RELAX, and Contrast-FEL do.

Best, Sergei

Emilyaoc commented 1 year ago

Dear Sergei, That really clarifies things for me - thank you. The thing I'm still a little unsure about is how to decide whether to label both the internal nodes and leaves versus when to only label the leaves? I was inclined to only label the leaves for all analyses as my phenotypic trait of interest is only categorised with certainty at the leaves (though there are some species pairs that share the trait CB or PB where I know the ancestral node would have the same trait value). From your suggestion for the RELAX test I get the impression that maybe I should also be labeling internal nodes? Or perhaps only those where I know what the trait value should be? I'm not sure I understand well enough how to make the decision between labeling leaves only verses including internal nodes? Thank you for your advice. Emily

spond commented 1 year ago

Dear @Emilyaoc,

There is no "best" way to label the tree. The main issue is that you don't want to assign incorrect "types" to internal branches and bias inference. There are several options that are reasonable.

1) [Super conservative] Label only the branches (Yes/No), treat all internal branches as unlabeled. The main inefficiency here is loss of power, because a smaller subset of branches is considered.

2) [Somewhat liberal] Label internal branches using conjunction, i.e if and only if all of its descendants have the same phenotype. Leave other internal branches unlabeled. The danger here is that you may include some internal branches in the wrong class, thereby biasing the inference. The benefit is an improvement in power, assuming the labeling is not too wrong.

3) [Banzai!] Label all internal branches using some criterion, e.g parsimony or a binary trait evolutionary model. You get the maximal size of branch sets, but also increase the rate of making labeling errors.

4) [Willing to a wait a long time] This is not implemented in HyPhy at the moment (but could be), is to just include the phenotype in the model. This will give rise to a covarion-type model, i.e. internal branches exist in an "uncertain" state, but data at the leaves are fully used. What this means from the standpoint of implementation is that your model state space 2x in size (models are not 61x61 but 122x122 or more), take much longer to fit and might have convergence problems.

Best, Sergei

Emilyaoc commented 1 year ago

Dear Sergei, Thank you for that clear advice. I think for my data it should be option 2 as I know with fairly high certainty that when species pairs have have the same phenotype the internal node that precede them should have the same trait value. Is there an option in label-tree.bf for the conjuction option? For '--internal-nodes' I think I only see the option 'None', 'Parsimony' or 'Some descendants'? Many thanks Emily

spond commented 1 year ago

Dear @Emilyaoc,

There are four options there:

    {"None","Only assign labels to selected nodes"}
    {"All descendants","Only label an internal node if all its descendants are labeled"}
    {"Some descendants","Only label an internal node if some of its descendants are labeled"}
    {"Parsimony","Use maximum parsimony to label internal nodes"}

With All descendants being the default.

Best, Sergei

Emilyaoc commented 1 year ago

Dear Sergei, Ahh, yes. Sorry, I had missed that (now I see it says this in the readme file)! Is there a way to apply two different labels (i.e. foreground and background) using the label-tree.bf tool? I think I can only see how to use it for applying one label, but perhaps/probably I just missed something again? Thank you Emily

spond commented 1 year ago

Dear @Emilyaoc,

You did not miss anything; there is no direct option to do this with label-tree.bf.

What you can do is label the tree sequentially, because label-tree.bf will respect existing labels. Something like

$hyphy label-tree.bf --tree unlabeled.nwk --regexp expression1 --label Foreground --output labeled-pass1.nwk
$hyphy label-tree.bf --tree labeled-pass1.nwk --regexp expression2 --label Background --output fully-labeled.nwk

Best, Sergei

Emilyaoc commented 1 year ago

Excellent, thank you!

github-actions[bot] commented 1 year ago

Stale issue message