Open jeromekelleher opened 6 years ago
After discussion on #121, I think the best approach here is to add start
and stop
arguments in sequence coordinates (as above). The trees in the returned sequence would guarantee that the first tree has left == start and the last tree would have right == stop. To do this, we'd need to 'split' the underlying trees so that we guarantee this property (that is, the end points of the trees we return don't necessarily align with the underlying edges). The reason for doing this is that if we're looking for the trees in a particular genomic interval, we almost certainly only want sites also within that interval. If we don't split the trees, we'll have sites arbitrarily far away which would have to be filtered manually, making client code much uglier and error prone.
This isn't hard to do, but will require implementation at the C level.
Copied from https://github.com/tskit-dev/tskit/pull/121:
Note that some of the docs use the words [left,right) rather than [start, stop), e.g. https://tskit.readthedocs.io/en/latest/python-api.html#tskit.TreeSequence.edge_diffs . "left" and "right" have the advantage that it's clearer (to me anyway) that we are talking about a length of genome and not e.g. tree indexes, or node times. But it has the disadvantage that it seems weird to specify e.g. (left=20.5, right=10.5)
in order to get an iterator that goes right to left. And there's definitely the advantage that start
and stop
match the range operator.
Thinking about how to implement this, I think the right way to do it is to add a tsk_tree_set_interval(left, right)
method which can shrink the interval for a tree. This would mainly involve updating the list of sites. The tsk_tree_advance
method would need to be updated to take this into account, so that if we call next() on a tree that has had the right coordinate moved leftwards, the next tree state is the remainder of the original tree.
The rest of the functionality can be built on this then.
Adding this to the 1.0 milestone for now as something to review and plan ahead for.
To match the variants()
iterator, we should use left
and right
here.
Also we can implement this quiet efficiently since #2661 was merged
These should be seen as analogous to the
start
andstop
arguments to Python's builtinrange
, and be specified in tree sequence coordinates. We should initially ensure that start < stop, but in future we could allow start > stop, and enable reverse iteration in this way.It would be best to do this with a low-level
sparse_tree_seek()
method to position the tree at the correct location.