Closed eldonb closed 4 years ago
I think the confusion here is that "population size" can be specified through the parameter Ne
to the simulate
function. However, with the new sim_ancestry
interface, we've dropped Ne
and are using population_size
instead.
So, I think your idea is a good one @eldonb. @JereKoskela, thoughts?
I don't have strong views on N vs Ne. I tend to think of N as the census size, and Ne as the population size which one needs to plug in to match the observed diversity, but that's far from well-established. Do we need notation for population sizes and coalescence probabilities elsewhere in the documentation? I think the most important thing is to be consistent.
The current implementation of the truncation point is a little bit more naive than what @eldonb described: basically, the user inputs the compound parameter truncation_point = K/(K + m(alpha)) directly. The disadvantage is that it's less explicit, but the advantage is that the standard, untruncated Beta-coalescent can be represented exactly as truncation_point = 1. If we had the user specify K instead, then we would need something like K = inf
as a default parameter to get K/(K + m(alpha)) = 1. @jeromekelleher, do you see any pitfalls with something like that?
I think I've also made a small mistake in the formula for m(alpha) in the haploid case. I'll fix that, and take the opportunity to make any other tweaks we decide upon here at the same time.
Do we need notation for population sizes and coalescence probabilities elsewhere in the documentation?
I guess we probably don't. We don't really need to get into the specifics of the model so much for non-multiple-mergers coalescents, as the details of the models are well known. It's probably better to use N rather than Ne if census population size is what we mean (and vice-versa, I guess).
If we had the user specify K instead, then we would need something like K = inf as a default parameter to get K/(K + m(alpha)) = 1.
I'm sure we could make it work. Either inf
or DBL_MAX
are used in quite a few places as default values, so this should be fine I think.
A question @jeromekelleher: I think I've made all the necessary changes in implementation, but wanted to run the multiple merger statistical tests in verification.py before opening a PR. Running python3 verification.py -c BetaSFS
results in AttributeError: module 'msprime' has no attribute 'simulator_factory'
I see that simulator_factory
was changed to _parse_simulate
in #1190, but making the same switch in verification.py via ctrl+r results in AttributeError: module 'msprime' has no attribute '_parse_simulate'
as well. Any idea what the problem might be?
Erp, my bad @JereKoskela - I forgot to update the stats tests when doing recent refactorings. I'll fix ASAP.
In the documentation for multiple-merger coalescents, I would state that time is measured in units of N^{alpha - 1} generations, rather than Ne^{alpha - 1}, where N is the census population size ( unless your notation for census size is Ne ?). The standard notation for "effective size" is Ne, which in this case would be (proportional to) N^{alpha - 1}.
The truncation point in the incomplete Beta-coalescent will depend on the actual underlying population model, but assuming a simple version can be taken as K/( K + m(alpha)), where m(alpha) = 1 + (2^{1 - alpha}) / (alpha - 1) in the haploid case (for 1 < alpha < 2), where K > 0 is the proportionality constant of the cut-off (each individual can produce at most KN 'juveniles' in the population model). The point is as alpha -> 1 the truncation point decreases.
In the diploid case (working with the Xi-Beta(2-alpha, alpha, K) coalescent) one can take m(alpha) = 2 + (2^{alpha}3^{1 - alpha})/(alpha - 1) (this can vary depending on the underlying population model but we don't need to go into that here).