Open jeromekelleher opened 8 months ago
The index names for CSI indexed VCFs must be derived from the index itself, because sequence names in an indexed VCF refer to observed sequences, not those that are listed in the header. The correct logic (I hope) is here:
https://github.com/jeromekelleher/bio2zarr/blob/880c3afee4465b4b94b921c815d436f3e4a78a46/bio2zarr/vcf_utils.py#L400
Some tests that should be straightforward to port to sgkit are here: https://github.com/jeromekelleher/bio2zarr/blob/880c3afee4465b4b94b921c815d436f3e4a78a46/tests/test_vcf_utils.py#L21
The index names for CSI indexed VCFs must be derived from the index itself, because sequence names in an indexed VCF refer to observed sequences, not those that are listed in the header. The correct logic (I hope) is here:
https://github.com/jeromekelleher/bio2zarr/blob/880c3afee4465b4b94b921c815d436f3e4a78a46/bio2zarr/vcf_utils.py#L400
Some tests that should be straightforward to port to sgkit are here: https://github.com/jeromekelleher/bio2zarr/blob/880c3afee4465b4b94b921c815d436f3e4a78a46/tests/test_vcf_utils.py#L21