pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
786 stars 273 forks source link

seek not implemented on cram files #1034

Open kcleal opened 3 years ago

kcleal commented 3 years ago

Trying to call seek on a cram file raises NotImplementedError. Is seek possible on a cram file? The corresponding tell method does seem to work.

kcleal commented 3 years ago

Not sure if this would work (sorry I haven't managed to get pysam to build yet on my mac to test), but the seek function could possibly be updated to the following, in libchtslib.pyx::

def seek(self, uint64_t offset):
    """move file pointer to position *offset*, see :meth:`pysam.HTSFile.tell`."""
    if not self.is_open:
        raise ValueError('I/O operation on closed file')
    if self.is_stream:
        raise IOError('seek not available in streams')

    cdef int64_t ret
    if self.htsfile.format.compression == bgzf:
        with nogil:
            ret = bgzf_seek(hts_get_bgzfp(self.htsfile), offset, SEEK_SET)
    elif self.htsfile.format.compression == no_compression:
        ret = 0 if (hseek(self.htsfile.fp.hfile, offset, SEEK_SET) >= 0) else -1
    # Add this elif block?
    elif self.htsfile.format.format == cram:
        with nogil:
            ret = cram_seek(cram_fd_get_fp(self.htsfile), offset, SEEK_SET)
    else:
        raise NotImplementedError("seek not implemented in files compressed by method {}".format(
            self.htsfile.format.compression))
    return ret