popsim-consortium / demes-spec

Demes demographic model specification
https://popsim-consortium.github.io/demes-spec-docs/
MIT License
4 stars 4 forks source link

Interpretation of multiple "identical" pulses #155

Open molpopgen opened 2 years ago

molpopgen commented 2 years ago

I came across this example today when testing something:

time_units: generations
demes:
 - name: A
   epochs:
    - start_size: 50
 - name: B
   epochs:
    - start_size: 50
pulses:
 - sources: [A]
   dest: B
   proportions: [0.9]
   time: 25
 - sources: [A]
   dest: B
   proportions: [0.9]
   time: 25

Currently, demes-python reads the model as valid, emitting a warning that the "outcome depends on the order". fwdpy11 rejects the model because the total rates sum to > 1.0 of deme B's ancestry.

How should this model be interpreted?

Possibly related: #91 and #114

cc @apragsdale b/c it is relevant to the fwdpy11 code.

apragsdale commented 2 years ago

For a discrete simulator like fwdpy11, I think it is the right thing to reject this model. If a user has tried to simulate with it, they either have messed something up when specifying the model, or they should be thinking about writing clearer models.

In continuous time simulations, they would be applied sequentially, so if you want fwdpy11 to allow this model, you'd want to "collapse" those pulse events, in which case you get A->B with proportion 0.99. But I do think the best thing for discrete models to do is error when they see a pathological model like this.

grahamgower commented 2 years ago

The spec says these should be applied in the order that they appear in the input. And for this reason, the model should not be interpreted to mean that the rates into B sum to greater than 1.0. So I think that check in fwdpy11 should be removed.

I agree that this particular example is unlikely to be constructed deliberately, and so it would be nice if it were rejected. But maybe the check should instead be for multiple pulses into the same dest deme at the same time. Technically that would be valid, but since it can be rewritten with a single pulse I think it would be ok to reject it.

molpopgen commented 2 years ago

@grahamgower -- I find the spec hard to interpret here. It says to apply sequentially and that ancestry proportions are just that, ancestry proportions. It doesn't clearly state that subsequent proportions should be applied to the remaining numbers of individuals. I think it would be easy to interpret multiple pulses as additive.

grahamgower commented 2 years ago

We discussed this at length previously, across multiple issues. At least #99 and #100. But probably in demes-python issues/PRs too.

The spec currently has: https://popsim-consortium.github.io/demes-spec-docs/main/specification.html#pulse

Pulse migration events specify the instantaneous replacement of a given fraction of individuals in a destination population by individuals with parents from a source population. The fraction must be between zero and one, and if more than one pulse occurs at the same time, those replacement events are applied sequentially in the order that they are specified in the model. The list of pulses must be sorted in time-descending order.

I'd be happy to review any patches clarifying this!

grahamgower commented 2 years ago

Oh, and to clarify why we want the current behaviour: it maximises consistency between discrete-time and continuous-time simulators. Imposing additional constraints in any one simulator (like, disallowing multiple pulses at the same time) is perfectly fine, but otherwise it would be really nice for models to be interpreted identically in both cases.

molpopgen commented 2 years ago

We discussed this at length previously, across multiple issues. At least #99 and #100. But probably in demes-python issues/PRs too.

The spec currently has: https://popsim-consortium.github.io/demes-spec-docs/main/specification.html#pulse

Pulse migration events specify the instantaneous replacement of a given fraction of individuals in a destination population by individuals with parents from a source population. The fraction must be between zero and one, and if more than one pulse occurs at the same time, those replacement events are applied sequentially in the order that they are specified in the model. The list of pulses must be sorted in time-descending order.

I'd be happy to review any patches clarifying this!

It seems that the discussions never made it into the spec. My point is this: the statement "applied sequentially" implicitly assumes something about how that would be done. In other words, there's something we believe to be a natural implementation that we are not being explicit about. The how is in this comment, but omitted from the spec.