popsim-consortium / stdpopsim

A library of standard population genetic models
GNU General Public License v3.0
126 stars 87 forks source link

Decide how we should handle `burn in` in the SLiM engine #420

Closed grahamgower closed 4 years ago

grahamgower commented 4 years ago

The SLiM engine currently supports using recapitation, or not. When recapitation is disabled, we wait until all individuals have a common ancestor after the start of the simulation, and then we wait an additional 10*N generations. N is taken to be the maximum population size of the populations that exist at the start of the simulation (all current models only have one). We currently have separate CLI options to disable recaptitation (--slim-no-recapitation) and to disable burn in (--slim-no-burnin).

In reviewing #409, @andrewkern suggested that the burn in time should be configurable, at least from the python API; and that we probably don't need both CLI options.

I propose that:

  1. We change the --slim-no-burnin parameter to be --slim-burnin X, where X specifies the length of the burn in period, in units of N generations. This can be set to zero to disable burn in.
  2. We add an option to disable checking for coalescence, separately to the burn in. This might be desirable in the future for non-neutral simulations that use a distribution of fitness effects (DFE). In such cases, we probably want to: disable checking for coalescence (because it will be really slow), specify a fixed burn in time with the DFE, and then optionally do recapitation to fix up genealogies without a single root.

We don't have to do (2) now. Another option is to get rid of checking for coalescence entirely.

jradrion commented 4 years ago

I second the idea of using --slim-burnin X, where --slim-burnin 0 means disable the burn, with the default being 10N gens.

I don't have a strong opinion on point 2, but your plan sounds perfectly reasonable to me.

petrelharp commented 4 years ago

Recapitation and burn-in are separate things - one is SLiM simulation time, one is msprime - so, I don't see how we can really eliminate an option, here, unless we want to have it always do recapitation? (Note, though if we check for coalescence we don't need to recapitate unless we have ancestral samples.)

As for coalescence checking: I don't think we need it, really. A default burn-in of 10N plus default recapitation should be fine, almost always, and these are configurable. Coalescence checking takes a maybe significant amount of time and adds a layer of conceptual complexity.

grahamgower commented 4 years ago

Always doing recapitation seems like an appealing choice actually. In this case, we also remove the within-slim check for coalescence. The user can then only specify a within-slim burn-in period. Is there any reason to have 10N generations of burn in by default in this case? Is 0 burn-in time + recapitation considered harmful for some reason?

petrelharp commented 4 years ago

Is 0 burn-in time + recapitation considered harmful for some reason?

Well, if recapitation is just as good, why are we using SLiM in the first place? It kinda defeats the purpose.

andrewkern commented 4 years ago

agree and i think we might want to show in the paper how doing recap might lead you astray, just to demonstrate the point

grahamgower commented 4 years ago

Ok... I'm still a little confused. So I'll just start submitting PRs to change the defaults and you can correct any misconceptions I have then. :)