samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Support IGV "browse mode" CRAM decoding #1597

Closed cmnbroad closed 2 years ago

cmnbroad commented 2 years ago

From the IGV request: Currently to decode a cram, you need to load an entire reference sequence (chromosome) for the region of interest. This is a big performance hit when loading the fasta by URL, which most IGV users are doing. This is not the case with cram.js, it just loads a slice of the sequence for the MD5 check.

cmnbroad commented 2 years ago

To implement this we mainly need two things:

Unclear whether we also need to add a way for the caller (i.e., IGV) to engage this mode. We may want to just automatically use it when handed a URL source, although it's unclear how that might affect performance in the case where you're decoding a full CRAM. We also need to think about any ramifications for multi-reference slices.

jrobinso commented 2 years ago

@cmnbroad For IGV's purposes I do not think you do not need "refget", IGV supplies the reference sequence when decoding CRAMs from IGV through an interface, it can easily be changed to supply reference fragments. See https://github.com/igvteam/igv/blob/master/src/main/java/org/broad/igv/sam/cram/IGVReferenceSource.java.

This interface (htsjdk.samtools.cram.ref.CRAMReferenceSource) was implemented to allow IGV (or other clients) to supply the sequence, since IGV must load sequence for the region in view for its own sequence track.