veg / hyphy-analyses

HyPhy standalone analyses
MIT License
37 stars 17 forks source link

Question on appropriate use of BUSTED-PH #35

Open Emilyaoc opened 1 year ago

Emilyaoc commented 1 year ago

Hello,

I'd like to ask for some help with how best to use BUSTED-PH. I am trying to test whether selection is different between genes in species according to a binary trait (CB vs PB). Example tree 1 is pasted below to demonstrate what I mean, where the foreground branches as specified {CB} and the background {PB}. There are many independent pairs of CB / PB across the tree. I understand that one option is to use BUSTED-PH specifying all the CB tips as the test group and all the PB tips as the comparison group. In which case, the line of code would be: ‘hyphy BUSTED-PH.bf --alignment alignment.fas --tree gene_tree.txt --branches CB --comparison PB --srv No’

I was wondering whether an alternative option that could be more sensitive to lineage specific differences could be to run BUSTED-PH separately for pairs of CB vs PB. So, one example would be as in example tree 2 below to get a test between Spp1 (CB) and Spp24 (PB). Then continue to do this for each of the paired comparisons. I don’t mean all possible combinations or CB vs PB, but all the actual pairs across the tree (e.g. Spp1 vs 24, Spp 6 vs 31, Spp 32 vs Spp 7 & 8). I realize this would create a multiple testing problem, but perhaps I could correct for it later? This approach would enable me to look at each set of comparisons separately (which I’d like to do). I wanted to ask your thoughts on this approach? Would it be potentially passable to perform independent BUSTED-PH tests for the different PB and CB pairs? Or is there a reason this approach would be flawed?

Thank you for your help.

Emily

Example tree 1: ((((((((((Spp27{PB}:0.1061445121,(Spp2{CB}:0.0270584277,Spp3{CB}:0.0261794982):0.0344910033):0.0035020037,(((Spp5{CB}:0.0035841027,Spp30{PB}:0.0057611215):0.0219621149,(Spp24{PB}:0.0321787049,Spp1{CB}:0.0178073773):0.0100827858):0.0352484102,((Spp4{CB}:0.0152403061,Spp28{PB}:0.0155103987):0.0038109277,Spp29{PB}:0.0079769122):0.0779856776):0.0087964965):0.001588438,(Spp26{PB}:0.0286713022,Spp25{PB}:0.0527021885):0.1245980631):0.0184857182,(Spp6{CB}:0.028444044,Spp31{PB}:0.0225514116):0.0596485678):0.0296406922,(((Spp40{PB}:0.1201390145,(Spp39{PB}:0.0548268922,(Spp16{CB}:0.0361122117,Spp17{CB}:0.0358397551):0.0246823189):0.0136822679):0.0397902583,((Spp34{PB}:0.0426910453,(Spp12{CB}:0.0185448918,Spp11{CB}:0.0237418132):0.0244439469):0.090134941,((Spp45{PB}:0.0045815149,Spp21{CB}:0.0046461625):0.0337996368,Spp46{PB}:0.0312817224):0.0816084984):0.0090666726):0.0057031342,((Spp19{CB}:0.0468786749,Spp43{PB}:0.0382526367):0.1378100849,(((Spp14{CB}:0.0111939159,Spp36{PB}:0.0137225212):0.0591854198,Spp37{PB}:0.0738589427):0.0666284331,(Spp35{PB}:0.0104790193,Spp13{CB}:0.0095368591):0.0733485724):0.0576597698):0.0074808915):0.0420763111):0.0090785591,Spp33{PB}:0.1539224942):0.0084542564,(Spp9{CB}:0.0176071397,Spp10{CB}:0.0141676561):0.1158838754):0.0150910971,((Spp47{PB}:0.0366681243,Spp22{CB}:0.0386648034):0.063511177,(Spp32{PB}:0.0189333796,(Spp7{CB}:0.0142000867,Spp8{CB}:0.0107546269):0.0084506914):0.0844803839):0.0411672233):0.2360367563,(((Spp20{CB}:0.0025737016,Spp44{PB}:0.0028672275):0.2623635034,(Spp23{CB}:0.0270701033,Spp48{PB}:0.0196880331):0.2388883853):0.0568027823,((Spp41{PB}:0.0124954283,Spp18{CB}:0.0171455802):0.0306188918,Spp42{PB}:0.0605509651):0.1576290146):0.0447594415):0.6098442627,Spp38{PB}:0.1317968626,Spp15{CB}:0.3308590102)

Example tree 2: ((((((((((Spp27:0.1061445121,(Spp2:0.0270584277,Spp3:0.0261794982):0.0344910033):0.0035020037,(((Spp5:0.0035841027,Spp30:0.0057611215):0.0219621149,(Spp24{PB}:0.0321787049,Spp1{CB}::0.0178073773):0.0100827858):0.0352484102,((Spp4:0.0152403061,Spp28:0.0155103987):0.0038109277,Spp29:0.0079769122):0.0779856776):0.0087964965):0.001588438,(Spp26:0.0286713022,Spp25:0.0527021885):0.1245980631):0.0184857182,(Spp6:0.028444044,Spp31:0.0225514116):0.0596485678):0.0296406922,(((Spp40:0.1201390145,(Spp39:0.0548268922,(Spp16:0.0361122117,Spp17:0.0358397551):0.0246823189):0.0136822679):0.0397902583,((Spp34:0.0426910453,(Spp12:0.0185448918,Spp11:0.0237418132):0.0244439469):0.090134941,((Spp45:0.0045815149,Spp21:0.0046461625):0.0337996368,Spp46:0.0312817224):0.0816084984):0.0090666726):0.0057031342,((Spp19:0.0468786749,Spp43:0.0382526367):0.1378100849,(((Spp14:0.0111939159,Spp36:0.0137225212):0.0591854198,Spp37:0.0738589427):0.0666284331,(Spp35:0.0104790193,Spp13:0.0095368591):0.0733485724):0.0576597698):0.0074808915):0.0420763111):0.0090785591,Spp33:0.1539224942):0.0084542564,(Spp9:0.0176071397,Spp10:0.0141676561):0.1158838754):0.0150910971,((Spp47:0.0366681243,Spp22:0.0386648034):0.063511177,(Spp32:0.0189333796,(Spp7:0.0142000867,Spp8:0.0107546269):0.0084506914):0.0844803839):0.0411672233):0.2360367563,(((Spp20:0.0025737016,Spp44:0.0028672275):0.2623635034,(Spp23:0.0270701033,Spp48:0.0196880331):0.2388883853):0.0568027823,((Spp41:0.0124954283,Spp18:0.0171455802):0.0306188918,Spp42:0.0605509651):0.1576290146):0.0447594415):0.6098442627,Spp38:0.1317968626,Spp15:0.3308590102)

spond commented 1 year ago

Dear @Emilyaoc,

Other than multiple comparisons and loss of power due to a reduced sample size (# of branches), there are no fundamental statistical issues that I see. But the double-whammy of multiple testing corrections and few branches per test is likely to result in a big set of null results. An alternative, more positive, possibility is that by looking at smaller branch sets, you will be able to better reflect their specific selective regimes, which could be "smoothed" out to the tree average when you do the complete analysis.

I would say you should run the test on Tree 1 to see if there is anything there when you do a joint analysis and if there is, maybe explore individual comparisons.

Best, Sergei