veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
200 stars 68 forks source link

codons contributing to significant aBSREL #386

Closed phickner closed 8 years ago

phickner commented 8 years ago

What is the best method to determine which codons are contributing to positive selection (w+) at a branch/node as determined by aBSREL? There is usually a very small percentage of sites with omega>1. It would be nice to see where these sites are located. I read a little bit on MEME, but I don't know if this would be an appropriate post-hoc analysis to find +omega in a specific branch. As always, your help is greatly appreciated.

spond commented 8 years ago

Dear @phickner,

There is no real power to find these sites (this is the evolutionary uncertainty principle). See slides from my recent talk at http://bit.ly/pond-hutch-2015 for some intuition. That said, one could post process the fits generated by aBSREL to generate approximate logL ratios like we do for BUSTED. It's cheap to do, computationally, but I have no idea how well/reliable these estimates might be. My best guess is that you could use them only for exploratory analyses. Do you want me to write you a simple script like that?

Remember, that you can also have a very diffuse signal driving aBSREL -- there is some selection just not certain where, i.e. no single codon could come back as significant.

Sergei

phickner commented 8 years ago

Hi Sergei,

I ran an a priori analysis using BUSTED to test each node that was significant following aBSREL analysis thinking that there could be some insights based on the model evidence ratios, but I was not sure. That's why I thought I should contact you for more information. So any help would be appreciated. Also, when looking at the JSON output for aBSREL there is a phylogeny showing omega values. If the color on the significant node corresponds to a low omega value, is this indicative of negative selection at this node rather than positive? I attached two JSON files to illustrate what I mean.

Thanks for your help. Paul

On Wed, Feb 10, 2016 at 1:14 PM, Sergei Pond notifications@github.com wrote:

Dear @phickner https://github.com/phickner,

There is no real power to find these sites (this is the evolutionary uncertainty principle). See slides from my recent talk at http://bit.ly/pond-hutch-2015 for some intuition. That said, one could post process the fits generated by aBSREL to generate approximate logL ratios like we do for BUSTED. It's cheap to do, computationally, but I have no idea how well/reliable these estimates might be. My best guess is that you could use them only for exploratory analyses. Do you want me to write you a simple script like that?

Sergei

— Reply to this email directly or view it on GitHub https://github.com/veg/hyphy/issues/386#issuecomment-182510522.

Paul V. Hickner, Ph.D. Postdoctoral Research Associate Department of Biological Sciences 319 Galvin Life Sciences University of Notre Dame Notre Dame, IN 46556 (574) 631-7860

spond commented 8 years ago

Dear @phickner,

The JSON files were stripped out by GitHub's e-mail handler. You can use gist.github.com to share them if you'd like.

Sergei

spond commented 8 years ago

Dear @phickner,

Dark colors do correspond to low ω values. In fact you would expect most of the sites along most of the branches to be under negative selection. Note that aBSREL does not explicitly test for negative selection, but then again it won't be a very interesting test, because the prior expectation is that the vast majority of branches will show some evidence of negative selection.

Your idea to run BUSTED on a fixed subset of branches (identified by aBSREL) is a reasonable exploratory tool. It commits the cardinal statistical sin of hypothesis testing: you use the results of one test on the data to define another test on the same data, but if all you want is an idea of where selection signal comes from, this should work.

It is also not particularly difficult to generate the same set of evidence ratios directly from aBSREL output (the likelihood function fit). If you are interested, I can put together a Gist for doing so.

Sergei

phickner commented 8 years ago

Hi Sergei, I will most likely exclude the colors from my phylogeny figures in order to avoid any confusion and just report the omega, corrected p value, and proportion of sites. And yes, I would like to take a look a the evidence ratios if that is not too much trouble. I really appreciate the help. Thank, Paul

On Fri, Feb 12, 2016 at 3:50 PM, Sergei Pond notifications@github.com wrote:

Dear @phickner https://github.com/phickner,

Dark colors do correspond to low ω values. In fact you would expect most of the sites along most of the branches to be under negative selection. Note that aBSREL does not explicitly test for negative selection, but then again it won't be a very interesting test, because the prior expectation is that the vast majority of branches will show some evidence of negative selection.

Your idea to run BUSTED on a fixed subset of branches (identified by aBSREL) is a reasonable exploratory tool. It commits the cardinal statistical sin of hypothesis testing: you use the results of one test on the data to define another test on the same data, but if all you want is an idea of where selection signal comes from, this should work.

It is also not particularly difficult to generate the same set of evidence ratios directly from aBSREL output (the likelihood function fit). If you are interested, I can put together a Gist for doing so.

Sergei

— Reply to this email directly or view it on GitHub https://github.com/veg/hyphy/issues/386#issuecomment-183484175.

Paul V. Hickner, Ph.D. Postdoctoral Research Associate Department of Biological Sciences 319 Galvin Life Sciences University of Notre Dame Notre Dame, IN 46556 (574) 631-7860

spond commented 8 years ago

Dear @phickner,

I worked hard at making pretty color graphs:( Oh well. If you think others might find them confusion, how do you think the rendering can be improved?

I'll make a post-processor file for you.

Sergei

phickner commented 8 years ago

My PI looked at my figures and asked me why the omega value at my "significant" nodes corresponded to a very low omega if I am saying there is a signature of positive selection based on the aBSREL analysis. Using the handy results viewer (JSON files) I was able to visualize what is going on with with w1 and w2, but the colors on the phylogeny prompted several question that I was not fully prepared to answer. If I think of an alternative way to illustrate the results I will let you know.

By the way, I thing the HyPhy suite of programs is great. It is really helpful for those of us who are not computer scientists. Paul

On Fri, Feb 12, 2016 at 4:03 PM, Sergei Pond notifications@github.com wrote:

Dear @phickner https://github.com/phickner,

I worked hard at making pretty color graphs:( Oh well. If you think others might find them confusion, how do you think the rendering can be improved?

I'll make a post-processor file for you.

Sergei

— Reply to this email directly or view it on GitHub https://github.com/veg/hyphy/issues/386#issuecomment-183487416.

Paul V. Hickner, Ph.D. Postdoctoral Research Associate Department of Biological Sciences 319 Galvin Life Sciences University of Notre Dame Notre Dame, IN 46556 (574) 631-7860

spond commented 8 years ago

Dear @phickner,

I think it almost essential to show multiple colors on a branch to indicate that the signal of positive selection often comes from a very small subset of sites. This helps differentiate between different modes of selection: large ω on a small proportion of sites in a background of low ω (strong localized selection); or an ω closer to 1 on a larger proportion of sites (diffuse weaker selection).

Sergei

phickner commented 8 years ago

It makes sense when you explain it like that. So the colors correspond to an average of all omega (w1 and w2) over all sites for that particular node?

On Fri, Feb 12, 2016 at 7:00 PM, Sergei Pond notifications@github.com wrote:

Dear @phickner https://github.com/phickner,

I think it almost essential to show multiple colors on a branch to indicate that the signal of positive selection often comes from a very small subset of sites. This helps differentiate between different modes of selection: large ω on a small proportion of sites in a background of low ω (strong localized selection); or an ω closer to 1 on a larger proportion of sites (diffuse weaker selection).

Sergei

— Reply to this email directly or view it on GitHub https://github.com/veg/hyphy/issues/386#issuecomment-183535351.

Paul V. Hickner, Ph.D. Postdoctoral Research Associate Department of Biological Sciences 319 Galvin Life Sciences University of Notre Dame Notre Dame, IN 46556 (574) 631-7860

spond commented 8 years ago

Dear @phickner,

Colors correspond to actual ω values. For example, in the picture below, for Node4, you will see that the coloring depicts the distribution (values and proportions) of ω parameters. About 20% of the length is deep red (high ω), while the rest is purple (0 ω). If you mouse over a tree branch, you will see a pop-up with the distribution shown.

aBSREL

Sergei

phickner commented 8 years ago

It was difficult to see the w2 colors because the proportion was so low. I will see if I can make the phylogeny colors more obvious because I think that would help simplify the figures, as opposed to adding the bar charts. Thanks for the clarification.

On Tue, Feb 16, 2016 at 6:02 PM, Sergei Pond notifications@github.com wrote:

Dear @phickner https://github.com/phickner,

Colors correspond to actual ω values. For example, in the picture below, for Node4, you will see that the coloring depicts the distribution (values and proportions) of ω parameters. About 20% of the length is deep red (high ω), while the rest is purple (0 ω).

[image: aBSREL] https://cloud.githubusercontent.com/assets/1018513/13094217/333f0356-d4be-11e5-9a1c-7ef128392c2e.png

Sergei

— Reply to this email directly or view it on GitHub https://github.com/veg/hyphy/issues/386#issuecomment-184911069.

Paul V. Hickner, Ph.D. Postdoctoral Research Associate Department of Biological Sciences 319 Galvin Life Sciences University of Notre Dame Notre Dame, IN 46556 (574) 631-7860