Closed hyanwong closed 2 months ago
I noticed this slight confusion in the SpARG preprint, for example, where they say:
An ARG contains the complete genetic history of a sample of recombining genomes (Griffiths and Marjoram, 1996; Hudson, 1983; Lewanski et al., 2023). It is commonly displayed as 1) a sequence of trees with each tree representing the history of a continuous block of the genome or 2) a single directed acyclic graph with annotated edges corresponding to their genomic intervals (see Fig 5A for an example of each). While the two representations can be interchangeable, tree sequences often lack the recombination events that tie the trees together and are further simplified, removing this equivalency (Wong et al., 2023).
This equates "tree sequence" with "sequence of trees"
Yes, we need to review the website material and revise to bring into line with what we're saying in the ARG paper.
Do we want to move to using msprime.sim_ancestry(additional_nodes = msprime.NodeType.COMMON_ANCESTOR | msprime.NodeType.RECOMBINANT)
in the arg tutorial, or simply point out that msprime.sim_ancestry(record_full_arg=True)
is a shorthand. In other words, are we trying to steer people away from record_full_arg
now?
I don't really mind to be honest. I don't think we'll bother formally deprecating record_full_arg
though.
We should change the ARG tutorial to say that msprime's
record_full_arg
is now simply a shorthand foradditional_nodes = msprime.NodeType.COMMON_ANCESTOR | msprime.NodeType.RECOMBINANT
.Additionally, in a few places in the tutorials, we don't really make the argument clearly enough that a tree sequence can represent ARGs. E.g.
We should probably say