Closed benjeffery closed 2 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 86.20%. Comparing base (
b1d7c4d
) to head (6a3147d
).:exclamation: Current head 6a3147d differs from pull request most recent head 49c0fe5. Consider uploading reports for the commit 49c0fe5 to get more accurate results
One thought experiment is: what if some app using tskit is already generating all site positions on 1 <= x < seq length + 1?
One thought experiment is: what if some app using tskit is already generating all site positions on 1 <= x < seq length + 1?
That wouldn't be a valid tree sequence as all positions have to be less than sequence length.
That wouldn't be a valid tree sequence as all positions have to be less than sequence length.
Oops -- it would be if I'd written it correctly (w/o the +1). But my point should have been: we have no firm requirement that the minimum position actually used is zero.
I'm not sure I get what you mean - you can have a tree sequence with no sites? Maybe you mean we have no firm specification for how the reference sequence in the tree sequence maps onto the position field. We don't even have a requirement that the ref seq length is equal to sequence_length-1
.
I'm not sure I get what you mean - you can have a tree sequence with no sites? Maybe you mean we have no firm specification for how the reference sequence in the tree sequence maps onto the position field. We don't even have a requirement that the ref seq length is equal to
sequence_length-1
.
Imagine that someone only considers positions from [10, seqlen). Their "genome" starts at position 10, not 0. That is a valid tree sequence and they can choose their seqlen so that the max allowed site position matches whatever they have in mind, say 100. So they are modeling a gene segment from positions [10, 100] for some reason using a table collection with seqlen of 101.
This is a valid use of the API. What would this PR do to this use case?
I think my questions/confusion are related to some comments in the linked issue: seqlen must be > the max allowed position but the data model is not necessarily zero indexed.
I'll go away now...
This is just checking to see if the first position is zero, nothing else. The seqlen stuff was a digression on the thread
I love it (as much as is possible for a weird VCF hack). Nice solution.
Just pinging this because this issue tripped me up again!
Fixes #2838
Note that this is a fairly breaking change that we should think about, given that the default is for msprime output to require the new flag to
write_vcf
.