kblin / ncbi-acc-download

Download files from NCBI Entrez by accession
Apache License 2.0
111 stars 8 forks source link

test suite failure with biopython 1.79 #23

Closed emollier closed 3 years ago

emollier commented 3 years ago

Greetings,

While working on updating Biopython to version 1.79 in Debian Experimental, we noticed that the new version of Biopython broke the test suite of ncbi-acc-download. Excerpt from the full log from Debian continuous integration, there are variations around the following symptom, affecting the _download_wgsparts function of wgs.py:

__________________ test_download_wgs_parts_supercontig_retry ___________________

req = <requests_mock.mocker.Mocker object at 0x7fc14613fb20>

    def test_download_wgs_parts_supercontig_retry(req):
        cfg = Config(format="genbank")
        supercontig = open(full_path('supercontig.gbk'), 'rt')
        req.get(ENTREZ_URL, response_list=[
            {"text": u'Whoa, slow down', "status_code": 429, "headers": {"Retry-After": "0"}},
            {"body": open(full_path('supercontig_full.gbk'), 'rt')}
        ])

        outhandle = wgs.download_wgs_parts(supercontig, cfg)
        supercontig_full = open(full_path('supercontig_full.gbk'), 'rt')
>       assert outhandle.getvalue() == supercontig_full.read()
E       AssertionError: assert 'LOCUS       ...671808)\n//\n' == 'LOCUS       ...ccctaac\n//\n'
E         - LOCUS       NC_007194                 60 bp    DNA     linear   CON 03-APR-2018
E         ?                                  ^^^^^^^                            ^^ ^^^   ^^
E         + LOCUS       NC_007194            4918979 bp    DNA     linear   CON 11-NOV-2009
E         ?                                  ^^^^^^^                            ^^ ^^^   ^^
E           DEFINITION  Aspergillus fumigatus Af293 chromosome 1, whole genome shotgun
E                       sequence.
E           ACCESSION   NC_007194...
E         
E         ...Full output truncated (22 lines hidden), use '-vv' to show

test_wgs.py:145: AssertionError

After some digging, the issue turned out to be related to the deprecation of Bio.Seq.UnknownSeq in Biopython 1.79. Apparently, unknown sequences are now created as regular sequences of class Seq with data being None and non-zero length, and thus the test on record.seq being an instance of UnknownSeq always fails.

The problem was initially discussed on Debian bug tracker issue #996794, where I devised on a patch with which I tried to remain functionally equivalent to the existing code while extending support to both Biopython 1.78 and 1.79. I didn't prepare a merge request though, as I'm still feeling a bit uncomfortable about my approach which I deem heavy, and believe you probably would have a more elegant approach to fix the issue. Of course feel free to use the patch as is if you feel it is appropriate.

In hope this helps, Have a nice day :)

kblin commented 3 years ago

I'm not sure there's a nice way to support both 1.78 and 1.79 from the same code base.

kblin commented 3 years ago

I've had a discussion with the Biopython authors about the pain points of their 1.79 changes here, but then got sidetracked and just pinned to 1.78 for local use. I lost track of how many of my python packages got broken by that change.

I guess it's time for a new release that just bumps the minimum required version to 1.79 and builds on the new code, though even for that I really hate the code necessary.

kblin commented 3 years ago

Ok, this is also breaking PR #22

emollier commented 3 years ago

I just recalled to mention in case you were to use it: I wrote the patch in Debian context, and the writing to /dev/null to make sure the error is raised wouldn't be portable on non-Unix systems.

Otherwise thanks for the context, this was informative.

kblin commented 3 years ago

It looks like I have started to fix the issue on this machine three months ago and forgot about it.

kblin commented 3 years ago

Should be fixed by dcba66f4715c586cdac89ccf26480ede79307d6b, I'll release a new version of ncbi-acc-download once PR #22 is merged.

emollier commented 3 years ago

Aaah, great! Thanks for your work on this topic! :)

kblin commented 3 years ago

Fixed in release 0.2.8 that's now on pypi (and here).