Open huddlej opened 4 years ago
Ok, this took four attempts, but I think I've worked it out. The change here is simple but the reasoning involves annoying coordinate bookkeeping. Here is an example.
In the original augur mask implementation the following BED file,
SEQ 3 5
was converted to 1-indexed positions 3, 4, 5
.
The standard BED file format should read these coordinates into the 0-indexed positions 3, 4
. These positions correspond to the following 1-indexed positions that would be expected by vcftools 4, 5
.
To get the expected 1-indexed positions for vcftools from a BED file, we need to decrement the interval start by 1:
SEQ 2 5
This produces the 0-indexed positions of 2, 3, 4
and the 1-indexed positions of 3, 4, 5
.
@huddlej @emmahodcroft Is this still relevant, or can this old PR be closed out?
augur mask now reads in BED files following the standard expectation of a zero-indexed, half-open interval such that the last value in each interval is not included in the coordinates [1]. This commit updates the mask BED file for this build to increment each interval by one to compensate this change in augur mask.
[1] https://github.com/nextstrain/augur/pull/512#issuecomment-608962457