weissmanlab / BB_bottleneck

7 stars 8 forks source link

Non-convergence on close frequences #2

Open jsan4christ opened 3 years ago

jsan4christ commented 3 years ago

Hi,

I wanted to find out if this a normal characteristic of the beta binomial and how you interpret such values. When the donor and recipient allele frequencies are nearly equal e.g (donor 0.04, recipient 0.045), in the approximate mode will not converge (bottleneck will be inf or -inf) and in the exact, will converge at the maximum nbMax set.

Please advise, happy to send a reproducible example

tsmit58 commented 3 years ago

Hi,

Thanks for letting us know. Can you send a reproducible example to tyler.smith@emory.edu? I suspect it's a bug, but I'm not sure. I'll look into it.

tsmit58 commented 3 years ago

Hello again,

There were indeed some bugs with the confidence intervals in the case of the optimal bottleneck value being on the edge of the range. These have been corrected. I do not think that the optimal value of Nb diverges for any of your data. I think it is just large. For most of your data, setting Nb_max to 20,000 seems to suffice. For the file with prefix K002198, the bottleneck was even larger. I have added a new argument, Nb_increment, which allows you to skip many Nb values and just look at every 10th (or 100th, etc.) value of Nb. With the new approximate code, setting Nb_min = 1, Nb_max =1000000, and Nb_increment=50000 yields an Nb estimate of 550,000 for the file starting with K002198.

Please give the new approximate codes a try, and (if these work) then try the exact codes. Using Nb_increment judiciously should speed things up quite a bit. If you're still having issues, please respond here or shoot me another email.

Thanks,

Tyler

jsan4christ commented 3 years ago

Excellent,

Let me run it again and get back to you.

With kind regards.

On Fri, Jan 15, 2021 at 2:05 AM tsmit58 notifications@github.com wrote:

Hello again,

There were indeed some bugs with the confidence intervals in the case of the optimal bottleneck value being on the edge of the range. These have been corrected. I do not think that the optimal value of Nb diverges for any of your data. I think it is just large. For most of your data, setting Nb_max to 20,000 seems to suffice. For the file with prefix K002198, the bottleneck was even larger. I have added a new argument, Nb_increment, which allows you to skip many Nb values and just look at every 10th (or 100th, etc.) value of Nb. With the new approximate code, setting Nb_min = 1, Nb_max =1000000, and Nb_increment=50000 yields an Nb estimate of 550,000 for the file starting with K002198.

Please give the new approximate codes a try, and (if these work) then try the exact codes. Using Nb_increment judiciously should speed things up quite a bit. If you're still having issues, please respond here or shoot me another email.

Thanks,

Tyler

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/weissmanlab/BB_bottleneck/issues/2#issuecomment-760555285, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGBQRSNEAE75IHO3MWXQATSZ6BGNANCNFSM4WADU5CA .

-- San Emmanuel James Skype: jsan4christ Mobile: UG +256752900304, SA +27 67 833 1444

The Lord is my shepherd, I shall not want! Psalms 23