veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
206 stars 69 forks source link

If ω > 1000, 5000,even 10000, how to explain? #393

Closed JaneZXJ closed 8 years ago

JaneZXJ commented 8 years ago

Hi, Sergei, In my results of GAB or BSR, the positive selected lineages have extremely high ratios. Is that the case of overfitting of small sample size, as in your BSR paper? My datasets include 11 seqs [http://www.datamonkey.org/spool/upload.590080565620739.1_bsr.php] , 19 seqs [http://www.datamonkey.org/spool/upload.785445843820169.1_bsr.php] or 24 seqs [http://www.datamonkey.org/spool/upload.391348098717127.1_bsr.php].

I'm not sure about that, could you give me more explanation? Thanks so much!

My appreciation for your help. Jane

spond commented 8 years ago

Dear @JaneZXJ,

Values like this should be interpreted as infinite. There is usually a very wide confidence interval on such values, so you should simply treat them as large values, greater than 1 (if the LRT test is significant).

Sergei

murrellb commented 8 years ago

To chime in here:

Imagine you're flipping a coin, and you want to estimate the ratio of heads/tails. Lets say you flip that coin 6 times, and see only heads. You can reject the null hypothesis that the heads/tails ratio is 1 with a bit of confidence, but the point estimate for this would be that the ratio is infinity, which is just a pathology of parameter point estimates (but not p-values!) for small samples in the maximum-likelihood framework.

In the analogy, non-synonymous substitutions are heads, and synonymous substitutions are tails, so all that these sorts of problematic point estimates require is that there are no synonymous substitutions inferred at that site or on that branch or on that bit of the tree (depending on the model you're using).

Cheers, Ben

On Thu, Feb 18, 2016 at 10:13 AM, Sergei Pond notifications@github.com wrote:

Dear @JaneZXJ https://github.com/JaneZXJ,

Values like this should be interpreted as infinite. There is usually a very wide confidence interval on such values, so you should simply treat them as large values, greater than 1 (if the LRT test is significant).

Sergei

— Reply to this email directly or view it on GitHub https://github.com/veg/hyphy/issues/393#issuecomment-185842213.

JaneZXJ commented 8 years ago

Thanks, Ben, It's a good analogy to understand this issue. So if it's the problem of small sample size in ML estimation. Is is a test for suitable sample size in the performed model ? Or it's really not a problem for episodic selection analysis here. Thanks again!

Cheers, Jane

Xiaojia Zhu Ph.D student Ornithological research group Key Laboratory of Zoological Systematics and Evolutionary Center 5 Institute of Zoology, CAS 1 Bei Chen West road Chao Yang District Beijing,100101 xiaojia0402@hotmail.com zhuxiaojia@ioz.ac.cn Office: 010-64807188 Phone: 86-13581827838

From: Ben Murrell Date: 2016-02-19 02:26 To: veg/hyphy CC: JaneZXJ Subject: Re: [hyphy] If ω > 1000, 5000,even 10000, how to explain? (#393) To chime in here:

Imagine you're flipping a coin, and you want to estimate the ratio of heads/tails. Lets say you flip that coin 6 times, and see only heads. You can reject the null hypothesis that the heads/tails ratio is 1 with a bit of confidence, but the point estimate for this would be that the ratio is infinity, which is just a pathology of parameter point estimates (but not p-values!) for small samples in the maximum-likelihood framework.

In the analogy, non-synonymous substitutions are heads, and synonymous substitutions are tails, so all that these sorts of problematic point estimates require is that there are no synonymous substitutions inferred at that site or on that branch or on that bit of the tree (depending on the model you're using).

Cheers, Ben

On Thu, Feb 18, 2016 at 10:13 AM, Sergei Pond notifications@github.com wrote:

Dear @JaneZXJ https://github.com/JaneZXJ,

Values like this should be interpreted as infinite. There is usually a very wide confidence interval on such values, so you should simply treat them as large values, greater than 1 (if the LRT test is significant).

Sergei

— Reply to this email directly or view it on GitHub https://github.com/veg/hyphy/issues/393#issuecomment-185842213.

— Reply to this email directly or view it on GitHub.