veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
221 stars 69 forks source link

alternative algorithm to BS-REL ? #931

Closed kmeusemann closed 5 years ago

kmeusemann commented 5 years ago

Dear all,

we want to do something like described in Ebel et al. 2017 https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007023#sec013 (see Estimating adaptation). Since BS-.REL is not available anymore, (aim is to "estimate the proportion of positively selected codons in each gene on each branch"), or better we want to look at selection on specific sites (having codon alignments). We looked at available methods: we are quite unsure about running aBSREL instead because it says: ......aBSREL will test, for each branch (or branch of interest) in the phylogeny, whether a proportion of sites have evolved under positive selection. So what else tzo use - MEME?

Many thanks Karen

spond commented 5 years ago

Dear @kmeusemann,

aBSREL is a more efficient version of BS-REL. BS-REL simply assumed 3 ω rate classes on each branch, and aBSREL selects the number that is based on the amount of signal in the data with many branches receiving only a single ω. You could inspect the JSON file produced by aBS-REL, and record the proportion of sites estimated to have ω > 1 along each branch.

In other words you can use the output of aBSREL in place of BS-REL. However, individual branch ω estimates could be quite noisy, i.e. with wide confidence intervals, hence I would not recommend using point estimates directly, without at least obtaining some sense of how reliable they are. For example, long branches and longer alignments would be expected to produce more precise ω estimate. aBSREL could be extended to also report confidence intervals on ω estimates.

If you want to look at specific sites under selection, then MEME is the way to go, but with MEME you will not be able to pinpoint the branches where selection is occurring.

Could you perhaps describe the biological hypothesis you are trying to test and I might be able to suggest a procedure to approach it?

Best, Sergei

kmeusemann commented 5 years ago

many thanks - I answered you via Email. Basically we want to look at specific branches if they have indeed positively selected sites (and if some which one) - like in codeML with the branch-site test... Best, Karen

spond commented 5 years ago

Dear Karen,

It is our opinion that you have no resolution to look at specific sites along a single branch. It is like trying to draw conclusions from a sample size on 1 -- you can make them, but they will be exceptionally unreliable and not statistically sound.

MEME will output, for each selected sites, which branches are likely contributing to the signal. Alternatively, you can run BUSTED, selecting a single branch as your test set, and the method will output some exploratory metrics about which sites have evidence of selection.

Neither one of those is something I would recommend as statistically proper inference, only as exploratory tools.

codeML is no different in that regard -- the sample size issue is fundamentally unavoidable here. You need other sources of information in addition to sequence variation.

Best, Sergei

kmeusemann commented 5 years ago

Yes, I see, better: I am aware of this. (It was rather look them back into the alignment and check e.g. if there is something "weired" e.g. frameshifts etc), but I agree of course, the sample size is crucial here. many thanks! Karen