pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
780 stars 274 forks source link

"[E::idx_find_and_load] Could not retrieve index file" when AlignmentFile #939

Closed cytham closed 1 year ago

cytham commented 4 years ago

[E::idx_find_and_load] Could not retrieve index file for 'test.bam'

Got this error when I do aln = pysam.AlignmentFile("test.bam", "rb") in pysam version 0.16.0.1 but not in version 0.15.3.

However, I think it might not affect the output. But please advise. Thanks

donovan-h-parks commented 4 years ago

Hi. I am also experiencing this issue. I can confirm that I indeed don't have an index file, but this is the expected operation for my program. Is there a way to suppress this warning message or explicitly indicate an index file is not expected?

wcxiaoping commented 4 years ago

Use samtools index command to build a index file can solve this problem.

tjakobi commented 4 years ago

I am seeing the same issue with the with pysam (0.16.0.1) and similar to @cytham the error is not coming up with 0.15.3.

jmarshall commented 4 years ago

This is a diagnostic that is new (in these circumstances) in HTSlib 1.10. In the context in which pysam is calling it it is just a warning and can be ignored — the pysam.AlignmentFile is constructed appropriately regardless.

It could be suppressed by using sam_index_load3(…, HTS_IDX_SAVE_REMOTE | HTS_IDX_SILENT_FAIL) instead of sam_index_load2() in _open(), but that would limit pysam to only working with HTSlib 1.10 or later. Further investigation of other ways to suppress it is needed.

natechols commented 4 years ago

@jmarshall This is preventing us from updating pysam in our production code, so I am willing to invest a little bit of time investigating fixes. Is there a tl;dr for building pysam locally so I can try building with different htslibs? (UPDATE: never mind, I think I'm running now...)

natechols commented 4 years ago

FWIW (following @jmarshall's suggestion):

diff --git a/pysam/libcalignmentfile.pyx b/pysam/libcalignmentfile.pyx
index b8e4230..0f91fbf 100644
--- a/pysam/libcalignmentfile.pyx
+++ b/pysam/libcalignmentfile.pyx
@@ -74,7 +74,7 @@ from cpython.version cimport PY_MAJOR_VERSION
 from pysam.libcutils cimport force_bytes, force_str, charptr_to_str
 from pysam.libcutils cimport encode_filename, from_string_and_size
 from pysam.libcalignedsegment cimport makeAlignedSegment, makePileupColumn
-from pysam.libchtslib cimport HTSFile, hisremote
+from pysam.libchtslib cimport HTSFile, hisremote, HTS_IDX_SAVE_REMOTE, HTS_IDX_SILENT_FAIL

 if PY_MAJOR_VERSION >= 3:
     from io import StringIO
@@ -1000,7 +1000,7 @@ cdef class AlignmentFile(HTSFile):

                 if cfilename or cindexname:
                     with nogil:
-                        self.index = sam_index_load2(self.htsfile, cfilename, cindexname)
+                        self.index = sam_index_load3(self.htsfile, cfilename, cindexname, HTS_IDX_SAVE_REMOTE | HTS_IDX_SILENT_FAIL)

                     if not self.index and (cindexname or require_index):
                         if errno:
@@ -1991,7 +1991,7 @@ cdef class IteratorRow:
                 if samfile.index_filename:
                     cindexname = bindex_filename = encode_filename(samfile.index_filename)
                 with nogil:
-                    self.index = sam_index_load2(self.htsfile, cfilename, cindexname)
+                    self.index = sam_index_load3(self.htsfile, cfilename, cindexname, HTS_IDX_SAVE_REMOTE | HTS_IDX_SILENT_FAIL)
             else:
                 self.index = NULL

diff --git a/pysam/libchtslib.pxd b/pysam/libchtslib.pxd
index 370e492..e559c0e 100644
--- a/pysam/libchtslib.pxd
+++ b/pysam/libchtslib.pxd
@@ -626,6 +626,9 @@ cdef extern from "htslib/hts.h" nogil:
     int8_t HTS_IDX_REST
     int8_t HTS_IDX_NONE

+    int8_t HTS_IDX_SAVE_REMOTE
+    int8_t HTS_IDX_SILENT_FAIL
+
     int8_t HTS_FMT_CSI
     int8_t HTS_FMT_BAI
     int8_t HTS_FMT_TBI
@@ -1091,6 +1094,14 @@ cdef extern from "htslib/sam.h" nogil:
     # @return  The index, or NULL if an error occurred.
     hts_idx_t *sam_index_load2(htsFile *fp, const char *fn, const char *fnidx)

+    # Load a specific BAM (.csi or .bai) or CRAM (.crai) index file
+    # @param fp     File handle of the data file whose index is being opened
+    # @param fn     BAM/CRAM/etc data file filename
+    # @param fnidx  Index filename, or NULL to search alongside @a fn
+    # @param flags  Flags passed to htslib
+    # @return  The index, or NULL if an error occurred.
+    hts_idx_t *sam_index_load3(htsFile *fp, const char *fn, const char *fnidx, int flags)
+
     # Generate and save an index file
     # @param fn        Input BAM/etc filename, to which .csi/etc will be added
     # @param min_shift Positive to generate CSI, or 0 to generate BAI

I was hoping there was an easy way to handle the library versions with ifdefs, but I don't see a way to make that work with Cython extensions.

jmarshall commented 4 years ago

Yes, that's pretty much the patch I had in mind, and I was hoping the same thing.

It can also be worked around by explicitly suppressing HTSlib's messages for the duration:

save = pysam.set_verbosity(0)
aln = pysam.AlignmentFile("test.bam", "rb")
pysam.set_verbosity(save)

and I suppose _open() could apply a similar workaround internally rather than use sam_index_load3().

If this HTSlib diagnostic output behaviour change has caused you difficulty in updating your production code, you should raise the issue with samtools/htslib.

natechols commented 4 years ago

Thanks, we can live with the Python workaround for now.