veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
211 stars 69 forks source link

Effect of tree size on the power to detect diversifying selection at a single branch? #1635

Closed Suirotras closed 1 year ago

Suirotras commented 1 year ago

Hi,

I have a question about the effect of decreasing tree size on the positive selection detection in one branch (aBSREL).

I have a large tree (and alignment) representing 241 mammalian species. Personally, I am only interested in (episodic) diversifying selection in a single branch of this tree. Because of the large computational load of running aBSREL with this huge tree, I was planning on decreasing the size of this tree and alignment to just 100 mammalian species. I am planning to primarily remove the shortest branches, while keeping the diversity of the tree as large as possible.

My question is, would this change have a large effect on the power to detect diversifying selection in my branch of interest? Or is this dependent on many different aspects of the sequences in the alignment?

Here is what I plan to do: The large unpruned tree (with 241 species): image

The pruned tree (with 100 species): image

Many thanks for your help and your work on HyPhy!

Sincerely, Jari

spond commented 1 year ago

Dear @Suirotras,

If you are interested in a single branch, I would say you should keep that branch and its local neighborhood (nearest taxa) intact, and prune more distant clades.

It depends on taxonomic sampling, but one of the strongest predictors of aBSREL performance is branch length. If you look at the salient figure in the original manuscript you will notice that the power to detect selection climbs as the branch length increases, a certain "saturation" point (~1 sub/site).

image

So when you prune the tree, focusing on a single branch, be careful to not "merge" this branch with those you delete.

For example, if the focal branch of the "complete" tree is like this (this is the data from tests/data/yokoyama.rh1.cds.mod.1-990.nex in the HyPhy distribution).

image

Then a good way to subsample would be like this ("bubbles" here mean : replace clade with one representative sequence).

image

You do NOT want to delete branches around the focal branch (like in the following picture), because you might subsume their evolutionary content into the "combined" branch, and thus conflate selection that might be happening up or down the tree.

image

All of the above applies (largely unchanged) to internal test branches, except for those you want to prune the tree in a way that maintains the clade which radiates from the internal node.

Best, Sergei

Suirotras commented 1 year ago

Hi @spond,

This is very useful. Thanks for the detailed answer!

Best, Jari