tskit-dev / msprime

Simulate genealogical trees and genomic sequence data using population genetic models
GNU General Public License v3.0
177 stars 88 forks source link

Add individuals for record_full_arg #1459

Open hyanwong opened 3 years ago

hyanwong commented 3 years ago

It's always bothered me that the record_full_arg output has 2 nodes that are only identifiable as grouped together because they are at the same time (unique in the nonWF case, but not necessarily in WF simulations). It just struck me that, since we are moving to adding individuals by default in sim_ancestry, we should probably group the pair of msprime recombination nodes into a new individual when record_full_arg=True. After all, by (biological) definition, two recombination nodes need to be in the same individual, even if the normal life cycle stage is haploid.

This would give a biologically meaningful grouping to the 2 nodes, and also extend nicely to cases where there are multiple recombination events per contain (not possible in the current msprime implementation, but quite biologically meaningful).

jeromekelleher commented 3 years ago

We're only adding individuals for samples at the moment, not any of the ancestral individuals. This would be a significant undertaking, so we're not doing it for 1.0.

hyanwong commented 3 years ago

We're only adding individuals for samples at the moment, not any of the ancestral individuals.

Yep, this would be a specific change for ancestral recombination nodes only (i.e. where we are sure the 2 ancestors are in the same individual: otherwise we have no way of knowing if nodes are associated in a specific individual)

This would be a significant undertaking, so we're not doing it for 1.0.

Understood. Definitely worth considering for the longer term, though, esp. if we are going to do an ARG tutorial some time.