Closed Sendrowski closed 4 months ago
Thanks for the report @Sendrowski - the figure seems to have gone missing, would you mind posting again please?
Hm that’s interesting. Did it work this time?
No - maybe a problem with GitHub?
Possibly — I managed to view it with an independent device. Here the link to the image if that is of any help.
Sorry, still not showing up for me (I got a 404 on that link)
And here is an iCloud link. Hope it works :)
Any thoughts here @JereKoskela ?
I ran a C++ script simulating Beta(2-a, a) random variables for a very close to 2, and relative errors of 5 or 6 per cent in an empirical mean from 10k samples are common because the true mean is so small. I think that explains what is going on below alpha = 1.995 or so.
As for the spike very close to 2, the timescale of the Schweinsberg Beta-coalescent model is unpleasant. It goes to infinity at alpha = 1 and zero at alpha = 2, so I wouldn't be surprised if numerical issues creep in near those boundaries. At the alpha = 2 boundary in particular, you're essentially sampling 0/0. The practical thing to do is just use the Hudson model, since probability of seeing a multiple merger in the the Beta-coalescent is effectively zero in that regime.
I agree that a relative error of this magnitude might perhaps not seem so surprising (100k replicates in my code example), but it is also important to note that the simulated values are consistently larger than the theoretical ones (I am not taking absolute values). For alpha=1.9
, I obtain a relative error of ~2.7% (1e6 replicates, very stable estimate), and for this value we have a (not so extreme) time scaling of about 0.14 relative to the Kingman coalescent. This plot is probably doing a better job (iCloud link in case you still can’t see my images):
You're right @Sendrowski, I missed the lack of a modulus sign, and evidently also one zero on the number of replicates. I'll do some more digging. It looks like a very odd bug; the code just calculates the timescale and multiplies it by an Exp(1) random variable. And somehow the resulting mean does not equal the timescale.
Yes, very odd indeed... The bias also appears to drop somewhat near alpha=1.99
before shooting up again. I hope it won't be very hard to find out. I didn't debug the C code, so perhaps the time scaling is simply different when calculated there. Or perhaps it is caused by including the incomplete beta function although I could not observe that when translating into native Python.
Ok, this turned out to be an insufficiently sensitive polynomial approximation for evaluating (essentially) x^2 / x^2
for very small x
. It was using a polynomial approximation for x < 1e-9
, which I've now hiked to x < 1e-5
. @Sendrowski, your Python script now yields unbiased estimators with errors on both sides of zero.
@jeromekelleher, any comments or are you happy to merge the linked PR? I've rerun the tests in verification.py for the Beta-coalescent and all still look good.
Thanks for sorting this out so quickly @JereKoskela! Change LGTM.
@Sendrowski, are you happy with the changes? Shall we push out a bugfix release?
Also looks good to me! Thank you for the quick fix @JereKoskela, and I look forward to meeting you in Warwick in April. As for the release, it’s totally up to you how soon you would like to push it out.
Hey again. The bias due to numerical imprecision for values of alpha very close to 2 persists, however. Maybe one should prohibit values greater than 1.99 or so to make sure this won't be causing any problems.
Sorry, still can't see the image @Sendrowski . I think maybe you're uploading a tiff instead of a png or something?
Hm it should be a png, but the graph I enclosed actually looks identical to the one I originally posted.
Yeah, the timescale will always break close to alpha = 1 or 2. There is no way to avoid that. I suppose we could disallow values very close to the boundaries as a result. I'd like to make sure that 1.01 and 1.99 work even if they come from floating point calculations in some other script, so maybe 1.009 and 1.991?
Sounds good! Of note perhaps that I could not observe the same kind of numerical instability for values near 1 — at least for similar distances to the boundary.
Thanks, that's useful to know. It's possible that we can get much closer to 1 than 2, but eventually everything will be infinite for alpha too close to 1. I'll see if I can find a reasonable lower bound.
Shall we reopen this issue, or make a new one?
I've reopened this one. I'm hoping to get to this within a few days.
Hey!
There appears to be an upward bias in the simulated coalescence times of the beta coalescent. This is relative to my expectation based on the time scaling provided in the documentation. The bias starts to be noticeable from
alpha=1.8
and increases substantially for values closer to 2.Relative error between observed and expected coalescent times plotted against alpha:![image](https://github.com/tskit-dev/msprime/assets/11058046/6f572525-befe-412a-988f-03eaf56d976a)
Code to generate figure:
Am I perhaps missing something?