Closed jeromekelleher closed 3 years ago
Partially done in recent changes. New short description is "Simulate genealogical trees and genomic sequence data using retrospective population genetic models."
Technically I guess the mutation models aren't particularly retrospective :shrug: Hard to get across the idea of simulating trees and mutations. I guess we could drop the "retrospective" with loss of much info.
@tskit-dev/all Any thoughts on what the quick tagline for msprime should be?
To me "genealogical" means pedigrees. I have to have an extra word in there, like "gene genealogies" to trip the appropriate switch in my head.
I guess if all the people looking at this are popgen folk, then maybe that's not such an issue? But a.g. animal scientists might be expecting a pedigree generator?
"retrospective" confuses me. I think "does that mean backwards?". We don't want the word "coalescent" in there somewhere? What about "genealogical trees of population history and resulting genomic sequence data"?
I'm betting that the motivation to mention retrospective and/or coalescent is to help visitors understand what makes this particular simulator special. I'm thinking there are four notable properties that make msprime awesome:
Do folks have an opinion on which of those four (or more) are the most important benefits to highlight? I'm guessing that the first 3 are the most important (and the coalescent grounding is good to mention secondarily)
Regarding WHAT gets simulated, is:
accurate as two most important things getting simulated?
Interestingly, I never used the term "tree" or "genealogy" in above.
I vote for "Simulate genealogical trees and genomic sequence data using coalescent population genetic models." But I know some of us differ in the precise meaning of some of those words - "retrospective" is fine with me also.
The goal here is to be descriptive and not wrong - we can't be precise and still be understandable, because we don't have a commonly-understandable term in english that means just what we want, AFAIK. So, while it's true that "genealogical tree" can also mean other things, it's not wrong (we simulate trees that are part of the genealogy), and people who come looking for a pedigree simulator will figure out what we're actually doing very quickly.
I'm with @petrelharp - we can't be both precise and understandable to non-experts. I'm still a bit queasy about "coalescent" because we are also simulating mutations, which have nothing to do with the coalescent. So, what if we drop it, like,
Simulate genealogical trees and genomic sequence data using population genetic models.
Good points above @castedo, but I think the goal here is to state what msprime does, rather than "what is msprime better at than other things". I think the job of convincing someone to use msprime vs other simulators is done elsewhere.
Suggesting a small modification so that it sounds less like a command: Simulating genealogical trees and genomic sequence data using population genetic models.
I'm with @petrelharp - we can't be both precise and understandable to non-experts. I'm still a bit queasy about "coalescent" because we are also simulating mutations, which have nothing to do with the coalescent. So, what if we drop it, like,
Simulate genealogical trees and genomic sequence data using population genetic models.
But then that description would pretty much apply to SLiM, too, right? It's too generic, and doesn't make clear what is different about msprime. The fact that it's backwards-in-time is important and needs to be in there. How about "backwards-in-time" rather than "coalescent" or "retrospective"?
Good discussion, thanks all!
@bhaller, yes, it would also apply to SLiM or any other simulator, more or less. I guess I'm thinking about someone random who drops in on the package/GitHub repo and wants to know what the package does. Follow up sentences can go into more detail about how it does these things and how it relates to other tools, but I'd like Generic GitHub User to have an idea of what the package is for by reading the first sentence.
I guess you could see it as a filtering process - first sentence gets rid of anyone who isn't interested in simulating DNA data or ancestral histories.
I'm still a bit queasy about "coalescent" because we are also simulating mutations, which have nothing to do with the coalescent. So, what if we drop it, like,
Simulate genealogical trees and genomic sequence data using population genetic models.
But then that description would pretty much apply to SLiM, too, right? It's too generic, and doesn't make clear what is different about msprime. The fact that it's backwards-in-time is important and needs to be in there. How about "backwards-in-time" rather than "coalescent" or "retrospective"?
Could we say "coalescent-based population genetic models"? Most of the demographic models use coalescence theory, right? But it's just that they aren't "the coalescent"? Perhaps not the selection ones, I guess?
Suggesting a small modification so that it sounds less like a command: Simulating genealogical trees and genomic sequence data using population genetic models.
I actually prefer "simulate" - it's more direct and easier to read. You could also read it as "You can use this to ... simulate etc etc.". When I see "Simulating XXX" I feel it should be followed by a reason, e.g.
Simulating genealogical trees and genomic sequence data using population genetic models, for the greater good.
😀
Makes sense to me what @jeromekelleher said about filtering and focusing on the "what" rather than the "why-different-or-better". So filtering down to the "what" and throwing in something very different here:
"Simulate descent of DNA sequence mutations from common population ancestors"
I'm throwing that out there just to mix in something very different. The small variants of the original tag line sound fine to me.
We're forgetting about mutations here - that's why I don't want to say "coalescent" or whatever. We have really powerful mutation generation abilities, these shouldn't be an afterthought.
We're forgetting about mutations here - that's why I don't want to say "coalescent" or whatever. We have really powerful mutation generation abilities, these shouldn't be an afterthought.
Is that an argument for explicitly stating this? So, for example, someone with some SLiM tree sequences would realise that they can go to msprime
to overlay mutations. E.g.
"Simulate genealogical trees and genomic sequence data via coalescent-based population genetic models and flexible mutation models"
or does that make it too long for this purpose?
We say "DNA sequence data", which implies there's got to be mutations in there somewhere.
+1 for "Simulate genealogical trees and genomic sequence data using population genetic models."
I considered "Simulate genealogical trees and their genomic sequence data using population genetic models." to highlight the relationship, but it is already quite wordy.
I'm going to go with "Simulate genealogical trees and genomic sequence data using population genetic models." I'll clarify things as appropriate in the various contexts, but I think this about as good as we'll do in 10 words (which is as much as is useful - this is going to be displayed as the short description in lots of limited-space contexts).
Thanks of the input all!
The introduction ("reimplementation of ms") needs to be modified; maybe also think of what else should be there, as a landing page?
Once we have the right description, make sure this is propagated to the README.md, README.rst etc so that it shows up in the PyPI page as well.