tskit-dev / msprime

Simulate genealogical trees and genomic sequence data using population genetic models
GNU General Public License v3.0
177 stars 88 forks source link

Help text for migration rate #1463

Closed jeromekelleher closed 3 years ago

jeromekelleher commented 3 years ago

The HTML output for demography has the nifty feature that hovering over certain information will try to explain it to you a bit better. This is especially important for the migration matrix:

tooltip

Somebody other than me should do this!

@petrelharp @grahamgower @apragsdale, what do you think?

grahamgower commented 3 years ago

I get this wrong every single time I need to think about it, so it definitely shouldn't be me either. :) But maybe the text should just match whatever is in the manual?

petrelharp commented 3 years ago

holy cow, that's amazing.

I don't think you can implement the "IMPLEMENT ME" thing, though, because the number depends on population sizes, which might be changing with time?

jeromekelleher commented 3 years ago

Well, hmm. For the demography, we output only information for the first epoch, as in, this is the state of the demography at the start of the simulation. We'll do the same thing per-epoch, in the Debugger.

Is that what you mean, or is this just not something we can say if there's a non-zero growth rate? In that case, there's surely something more helpful we can say?

petrelharp commented 3 years ago

I meant that a constant reverse-time migration rate from B to A, if A is a growing population and B is a population of fixed size, implies a decreasing forwards-time per capitat migration rate from A to B. But, I suppose we could report what the value is at the start of the interval - better than nothing?

apragsdale commented 3 years ago

Forwards in time, the migration matrix has the same interpretation, right? That the (i, j)th entry is the probability that an offspring in deme i had it's parent in deme j. At least, that's the construction in dadi (which then rescales by a constant 2N_ref to turn it into a rate in genetic units), as well as fwdpy11.. not sure about slim. So whether looking forward or backward in time, the migration probability remains constant over an epoch, even if the migration rate (with respect to proportion of lineages in the source deme moving per generation) changes. I find thinking of probabilities of drawing parents more easier than rates with respect to population sizes, and might be a clearer way of comparing forward and backward migration models here?

petrelharp commented 3 years ago

I think that quantity is what we are calling the "backwards time migration rate"; and if I understood correctly, Jerome was suggesting above also calculating "proportion of offspring in A that go to B", which we were calling the "forwards time migration rate". You're right, there's nothing inherently forwards- or backwards- about the two quantities, in that they're both calculable from either a forwards or a backwards simulation. I guess another distinction between them is that the "forwards" quantity is giving you what proportion of A is migrating, while the "backwards" quantity is giving you what proportion of B is formed of migrants, so they differ in focus (on A or B).

Alternatively, we could report the "number of migrants" (although this would have the same problem of changing through time if B is growing or shrinking).

apragsdale commented 3 years ago

Understood, and I agree. My thought was just that the goal of the documentation here is to be as clear as possible about what the migration "rate" matrix means in msprime (and most other) population genetics simulation methods. And that instead of saying "rate with respect to A vs B", it's much less ambiguous to say "probability that parents come from a different deme", and probably less confusing for someone trying to figure out what exactly the migration matrix means.

No doubt that this migration model is problematic when population sizes change, because it might not be realistic to have the expected number of migrants from a source change with its population size. This has come up for me in developing moments, where users have asked for the migration matrix to be allowed to change in time so that you can make the "forwards rate" stays constant. I just wonder if all the discussion about forwards vs backwards rates would muddy the water for a reader of the docs when msprime only handles the one scenario.

nspope commented 3 years ago

Unsolicited opinion: I too find migration rates easiest to understand as 'proportion of population i that has parent in population j'. No need for forwards/backwards to enter into description, matches the parameterization, and direction of migration is clear.

Another unsolicited opinion: however, if you do want to report the 'number of immigrants' in a growing population why not take the average over the epoch? M[i,j] / (growth_rate[i] * duration) * Ne[i] * (exp(growth_rate[i] * duration) - 1) I think it would be

jeromekelleher commented 3 years ago

I hadn't actually planned on doing anything fancy here, I just wanted to put in something that wasn't wrong. And yes, @grahamgower is right, it should just be a copy of whatever we say in the manual. I'd vote for keeping it as simple as we can.

JereKoskela commented 3 years ago

I'm sure this is overkill for this issue, but would it be desirable to eventually let a user specify either version of the migration matrix, and do any necessary conversion under the hood? We could describe them as "probability that a lineage on island j has a parent on island i" for the conventional backward matrix, and "fraction of individuals on island i migrating to island j forward in time" for the forward matrix, and give a brief worked example of how the two are connected. The tooltip could then just point to that section of the docs.

jeromekelleher commented 3 years ago

I'm happy to go with the consensus here; whatever we can do to reduce confusion on this point would be a big help I think, as it's a persistent source or pain. Maybe a worked section in the docs in the Demography section where we explain all this is a good start anyway.

jeromekelleher commented 3 years ago

1503 updates to look like this:

Screenshot from 2021-03-02 12-06-16

This is basically the same text as we have in the manual.

If someone has a better suggestion, please reopen and we can update again.

jeromekelleher commented 3 years ago

closed in #1503