sgkit-dev / sgkit

Scalable genetics toolkit
https://sgkit-dev.github.io/sgkit
Apache License 2.0
217 stars 32 forks source link

partition_into_regions: tabix index returns illegal region with multiple contigs #1203

Open jeromekelleher opened 4 months ago

jeromekelleher commented 4 months ago

With multiple contigs, Tabix indexed VCFs sometimes return regions with an end coordinate of 0 (which is illegal). See here for a test case: https://github.com/jeromekelleher/bio2zarr/blob/880c3afee4465b4b94b921c815d436f3e4a78a46/tests/test_vcf_utils.py#L135

The fix is pretty easy (I think): https://github.com/jeromekelleher/bio2zarr/blob/880c3afee4465b4b94b921c815d436f3e4a78a46/bio2zarr/vcf_utils.py#L505