mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
449 stars 75 forks source link

fix bgzf test failure with biopython 1.80 #199

Closed emollier closed 1 year ago

emollier commented 1 year ago

Starting with version 1.80, biopython does more stringent checks on opening bgzf files. This is causing a series of test failures in pyfaidx 0.7.1:

tests/test_Fasta_bgzip.py::test_build_issue_126 XFAIL                    [ 11%]
tests/test_Fasta_bgzip.py::test_integer_slice FAILED                     [ 11%]
tests/test_Fasta_bgzip.py::test_integer_index FAILED                     [ 12%]
tests/test_Fasta_bgzip.py::test_fetch_whole_fasta FAILED                 [ 13%]
tests/test_Fasta_bgzip.py::test_line_len FAILED                          [ 14%]
tests/test_Fasta_bgzip.py::test_mutable_bgzf PASSED                      [ 14%]
tests/test_Fasta_bgzip.py::test_long_names FAILED                        [ 15%]
tests/test_Fasta_bgzip.py::test_fetch_whole_entry FAILED                 [ 16%]
tests/test_Fasta_bgzip.py::test_fetch_middle FAILED                      [ 17%]
tests/test_Fasta_bgzip.py::test_fetch_end FAILED                         [ 17%]
tests/test_Fasta_bgzip.py::test_fetch_border FAILED                      [ 18%]
tests/test_Fasta_bgzip.py::test_rev FAILED                               [ 19%]
tests/test_Fasta_bgzip.py::test_fetch_past_bounds FAILED                 [ 20%]
tests/test_Fasta_bgzip.py::test_fetch_negative FAILED                    [ 20%]
tests/test_Fasta_bgzip.py::test_fetch_reversed_coordinates FAILED        [ 21%]
tests/test_Fasta_bgzip.py::test_fetch_keyerror FAILED                    [ 22%]
tests/test_Fasta_bgzip.py::test_blank_string FAILED                      [ 22%]
tests/test_Fasta_bgzip.py::test_slice_from_beginning FAILED              [ 23%]
tests/test_Fasta_bgzip.py::test_slice_from_end FAILED                    [ 24%]
tests/test_Fasta_bgzip.py::test_issue_74_start FAILED                    [ 25%]
tests/test_Fasta_bgzip.py::test_issue_74_consistency FAILED              [ 25%]
tests/test_Fasta_bgzip.py::test_issue_74_end_faidx FAILED                [ 26%]
tests/test_Fasta_bgzip.py::test_issue_74_end_fasta FAILED                [ 27%]
tests/test_Fasta_bgzip.py::test_issue_79_fix FAILED                      [ 28%]
tests/test_Fasta_bgzip.py::test_issue_79_fix_negate FAILED               [ 28%]
tests/test_Fasta_bgzip.py::test_issue_79_fix_one_based_false FAILED      [ 29%]
tests/test_Fasta_bgzip.py::test_issue_79_fix_one_based_false_negate FAILED [ 30%]
tests/test_Fasta_bgzip.py::test_fetch_border_padded FAILED               [ 31%]
[…]
                try:
                    # mutable mode is not supported for bzgf anyways
>                   self.file = bgzf.BgzfReader(fileobj=self.file, mode="b")

/tmp/autopkgtest-lxc.i9z1xpfv/downtmp/build.bvQ/src/pyfaidx/__init__.py:412:
[…]
        if mode.lower() not in ("r", "tr", "rt", "rb", "br"):
>           raise ValueError(
                "Must use a read mode like 'r' (default), 'rt', or 'rb' for binary"
            )
E           ValueError: Must use a read mode like 'r' (default), 'rt', or 'rb' for binary

/usr/lib/python3/dist-packages/Bio/bgzf.py:597: ValueError
[…]
                try:
                    # mutable mode is not supported for bzgf anyways
                    self.file = bgzf.BgzfReader(fileobj=self.file, mode="b")
                except (ValueError, IOError):
>                   raise UnsupportedCompressionFormat(
                        "Compressed FASTA is only supported in BGZF format. Use "
                        "the samtools bgzip utility (instead of gzip) to "
                        "compress your FASTA."
E                       pyfaidx.UnsupportedCompressionFormat: Compressed FASTA is only supported in BGZF format. Use the samtools bgzip utility (instead of gzip) to compress your FASTA.

/tmp/autopkgtest-lxc.i9z1xpfv/downtmp/build.bvQ/src/pyfaidx/__init__.py:414: UnsupportedCompressionFormat

A full log is available on Debian CI infrastructure, as I first noticed this while attempting to upgrade biopython to version 1.80 in Debian experimental.

Looking at the main development branch, I suspect this breakage might still occur. This patch makes opening bgzf files explicitly in read-only mode, and thus does not trigger the error anymore, without causing regression in the test suite.

Note the comment in BgzfReader constructor mentions that one would:

typically use the top level bgzf.open(...) function which will call this class internally. Direct use is discouraged.

So maybe this commit is more of a workaround than the proper fix.

Signed-off-by: Étienne Mollier emollier@debian.org

marcelm commented 1 year ago

I ran into this as well for #200. I think yours is the correct fix because bgzf.open would need a file name while the usage here is to re-open an existing file-like object (hence fileobj=).

emollier commented 1 year ago

Cool, thanks for the confirmation! :)

mdshw5 commented 1 year ago

Thanks for taking the time to provide this fix @emollier. It looks good and I will go ahead and merge.

mdshw5 commented 1 year ago

I've released a new version (v0.7.2) that incorporates these changes. Thanks for helping out!