tskit-dev / msprime-1.0-paper

Publication describing msprime 1.0
4 stars 20 forks source link

Draft of "major changes" table #217

Closed jeromekelleher closed 2 years ago

jeromekelleher commented 2 years ago

One of the main things we need is some summary of the major updates to msprime. Here's a first pass: Screenshot from 2021-11-18 15-42-28

What do we think?

grahamgower commented 2 years ago

Looks great!

hyanwong commented 2 years ago

L Great TM. A few comments:

  1. "Discrete and continuous genomic coordinates", maybe (because in the context it was written, people might think it's about demography/geographical coordinates)
  2. "Recording full ARG" - might be worth mentioning recombination nodes and the ability to calculate the likelihood? E.g. "Use of recombination nodes to record the full ARG (allowing exact likelihood calculations)"
  3. "More efficient for large numbers of populations" - maybe give values and/or a comparator - e.g. 20x more efficient than msprime 0.3 for large populations?
  4. What's "restricted time interval" about?
  5. Although they are a tskit things, should we mention metadata and individuals (i.e. the individuals table), as these are fairly major improvements to usability, I think?
  6. Do we want to mention the ability to do multiple chromosomes using DTWF and the hack at https://tskit.dev/msprime/docs/latest/ancestry.html#multiple-chromosomes or is that too much of a hack at the moment?
  7. Is the ability to recapitate a new thing too (since 0.3)? If so, this also seems like a pretty major improvement to me.
andrewkern commented 2 years ago

one observation here-- a lot of this information is hierarchically nested. would a figure a better choice? e.g. autodraw 11_18_2021

jeromekelleher commented 2 years ago

one observation here-- a lot of this information is hierarchically nested. would a figure a better choice? e.g.

Not sure I agree here: the interface is one "thing"; ancestry and mutation simulations (etc) are independent of that? I agree the line for demography and interface is very hazy though.

jeromekelleher commented 2 years ago

"More efficient for large numbers of populations" - maybe give values and/or a comparator - e.g. 20x more efficient than msprime 0.3 for large populations?

That's a bit of a how-long-is-a-piece-of-string one @Yan - pre 1.0 performance was quadratic in the number of populations. Now it's much better. There's no one number, though and you'd need to say something like "for example, in a 1D stepping stone model with 100 populations, and one haploid sample drawn each from the ends of the habitat and no recombination, we are X times faster"

hyanwong commented 2 years ago

"More efficient for large numbers of populations" - maybe give values and/or a comparator - e.g. 20x more efficient than msprime 0.3 for large populations?

That's a bit of a how-long-is-a-piece-of-string one @yan - pre 1.0 performance was quadratic in the number of populations. Now it's much better. There's no one number, though and you'd need to say something like "for example, in a 1D stepping stone model with 100 populations, and one haploid sample drawn each from the ends of the habitat and no recombination, we are X times faster"

Right, don't worry if it's not easy to say: it just sounded a little hand-wavy, but perhaps there's no better way to put it succinctly.

hyanwong commented 2 years ago

Perhaps "improved scaling (no longer quadratic) in the number of populations"? Or is that worse?

jeromekelleher commented 2 years ago

Do we want to mention the ability to do multiple chromosomes using DTWF and the hack at https://tskit.dev/msprime/docs/latest/ancestry.html#multiple-chromosomes or is that too much of a hack at the moment?

Too hacky I think, not worth bringing up.

jeromekelleher commented 2 years ago

Screenshot from 2021-11-19 11-28-19

Thanks for the input @hyanwong, this is very helpful. How does this look now?

grahamgower commented 2 years ago

Is it also worth mentioning that the interface now supports widespread use of population names as an alternative to using numerical ids?

jeromekelleher commented 2 years ago

Good point @grahamgower . How about

Improved interface with integrated metadata and referencing populations by name. Import from...

grahamgower commented 2 years ago

Yep, sounds good.

jeromekelleher commented 2 years ago

Updated

hyanwong commented 2 years ago

Thanks for the input @hyanwong, this is very helpful. How does this look now?

Great! Nice work @jeromekelleher

jeromekelleher commented 2 years ago

OK, let's merge this then. Thanks for the input!