tskit-dev / tutorials

A set of tutorials for msprime and tskit.
Creative Commons Attribution 4.0 International
18 stars 15 forks source link

Revisit ARG descriptions and calling syntax #273

Closed hyanwong closed 2 months ago

hyanwong commented 3 months ago

We should change the ARG tutorial to say that msprime's record_full_arg is now simply a shorthand for additional_nodes = msprime.NodeType.COMMON_ANCESTOR | msprime.NodeType.RECOMBINANT.

Additionally, in a few places in the tutorials, we don't really make the argument clearly enough that a tree sequence can represent ARGs. E.g.

Note: The genetic genealogy is sometimes referred to as an ancestral recombination graph, or ARG, and there are close similarities between ARGs and tree sequences (see the ARG tutorial)

We should probably say

Note: The genetic genealogy is sometimes referred to as an ancestral recombination graph, or ARG. One way to describe a tskit tree sequence is that it provides a way of storing various different sorts of ARGs (see the ARG tutorial)

hyanwong commented 3 months ago

I noticed this slight confusion in the SpARG preprint, for example, where they say:

An ARG contains the complete genetic history of a sample of recombining genomes (Griffiths and Marjoram, 1996; Hudson, 1983; Lewanski et al., 2023). It is commonly displayed as 1) a sequence of trees with each tree representing the history of a continuous block of the genome or 2) a single directed acyclic graph with annotated edges corresponding to their genomic intervals (see Fig 5A for an example of each). While the two representations can be interchangeable, tree sequences often lack the recombination events that tie the trees together and are further simplified, removing this equivalency (Wong et al., 2023).

This equates "tree sequence" with "sequence of trees"

jeromekelleher commented 3 months ago

Yes, we need to review the website material and revise to bring into line with what we're saying in the ARG paper.

hyanwong commented 3 months ago

Do we want to move to using msprime.sim_ancestry(additional_nodes = msprime.NodeType.COMMON_ANCESTOR | msprime.NodeType.RECOMBINANT) in the arg tutorial, or simply point out that msprime.sim_ancestry(record_full_arg=True) is a shorthand. In other words, are we trying to steer people away from record_full_arg now?

jeromekelleher commented 3 months ago

I don't really mind to be honest. I don't think we'll bother formally deprecating record_full_arg though.