Closed hyanwong closed 8 months ago
One implication of this is what we do when comparing the average genomic distance between two genomes, one of which has a material not present in the other (e.g. there has been a deletion). There will be a difference between such a region (which may not have any MRCA, and which could return NaN) and a region where the MRCA is at time=infinity.
@duncanMR and I decided we should allow nodes at time=infinity, but we should check that we don't allow a node at time=inf to be the parent of another node at time=inf.
At the moment we allow a gig to have a "grand MRCA node" at time=infinity, which allows us to find MRCA regions even if there are multiple roots. In the existing simulator code, this infinite-time MRCA is deliberately removed from the GIG on output, but is present when we turn the simulation-stored tables directly into a GIG.
This isn't allowed in
tskit
. Since the find_mrca function works on tables, we could simply ban infinite-time nodes from being frozen into a GIG and expect to run the function only on tables, not the frozen GIG. Or we could raise an error when we export such a GIG to a tree sequence instead. Or we could have an argument to theto_tree_sequence
method that removes infinite-time nodes.Either way, we should probably take an active decision about whether we deviate from the
tskit
paradigm here.