samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
276 stars 244 forks source link

VCF_CSI_index_implementation #1684

Open gokalpcelik opened 10 months ago

gokalpcelik commented 10 months ago

VCF-CSI index reading functionality for TabixReader.class

HTSJDK is unable to read CSI format VCF indexes unlike htslib and bcftools. Implementation/modifications were necessary to update HTSJDK and downstream tools to accept this index format for contigs larger than 2^29-1. CSIIndex class in HTSJDK was implemented for BAM style CSI index which is different from the VCF style CSI index. htslib already contains all the necessary modifications within the tbx.c. TabixReader.class already contains all the necessary code to read chunks from TBI format index files however CSI index requires reordering of byte reading steps and disabling linear index type as opposed to TBI format. Regions to bins (reg2bins) method also needs a new version to accomodate larger contig sizes and bin values. All changes were made in the original TabixReader.class to prevent additional rewiring of new index code back to VCFReader and AbstractFeatureReader classes.

Completed tasks:

Current To-do:

Things to think about before submitting:

LarsStegemanGT commented 5 months ago

Hello, What is the status of this PR? This feature would be very helpful to us. Thanks!