kblin / ncbi-acc-download

Download files from NCBI Entrez by accession
Apache License 2.0
111 stars 8 forks source link

Issue with recursive download #13

Open Anto007 opened 5 years ago

Anto007 commented 5 years ago

I tried ncbi-acc-download --recursive GHGH00000000.1 and I got the below error message. Any help here would be very much appreciated.

Traceback (most recent call last): File "/home/user/tools/Python-3.6.2/virtualenv3/bin/ncbi-acc-download", line 10, in sys.exit(main()) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/ncbi_acc_download/main.py", line 54, in main download_to_file(dl_id, config, filename, append) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/ncbi_acc_download/core.py", line 118, in download_to_file _validate_and_write(r, fh, dl_id, config) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/ncbi_acc_download/core.py", line 162, in _validate_and_write downloaded = download_wgs_parts(handle, config) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/ncbi_acc_download/wgs.py", line 107, in download_wgs_parts records = list(SeqIO.parse(handle, config.format)) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/Bio/SeqIO/init.py", line 655, in parse for r in i: File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/Bio/GenBank/Scanner.py", line 489, in parse_records record = self.parse(handle, do_features) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/Bio/GenBank/Scanner.py", line 473, in parse if self.feed(handle, consumer, do_features): File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/Bio/GenBank/Scanner.py", line 445, in feed self._feed_feature_table(consumer, self.parse_features(skip=False)) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/Bio/GenBank/Scanner.py", line 171, in parse_features raise ValueError("Premature end of features table, marker '//' found") ValueError: Premature end of features table, marker '//' found

kblin commented 5 years ago

This is a record type I've never seen before, and it looks like Biopython doesn't like it. I'll open a bug report with Biopython to get it fixed, and then I can make sure ncbi-acc-download supports it.

Anto007 commented 5 years ago

Thank you so much for your quick response. Any possible help in this regard would really make my day

kblin commented 5 years ago

I've opened a Biopython bug https://github.com/biopython/biopython/issues/2268, let's see what they think about this.

Anto007 commented 5 years ago

Thanks; fingers crossed! Also, how to get the fasta file for this record on recursive mode? For example, I tried the below and I get an empty fasta file:

ncbi-acc-download --recursive NZ_AQZU00000000.1 --format fasta

ncbi-acc-download --recursive NZ_AQZU00000000.1 does appear to give me the correct .gbk file

kblin commented 5 years ago

Hm, I think I've never tried this for FASTA files. I don't think it'll work out of the box.

Anto007 commented 5 years ago

Thanks again for your super-quick responses. I think I can live with .gbk files for now :-)

Anto007 commented 5 years ago

Hi, I was wondering if you have managed to find any sort of fix for "ncbi-acc-download --recursive GHGH00000000.1"? My intention is certainly not to push you here but I would be very grateful for any new pointers.

kblin commented 5 years ago

This will only be fixed once Biopython 1.75 is released, as that contains a fix for the problem.

Anto007 commented 5 years ago

Many thanks for your response. I will await the release of Biopython 1.75.

kblin commented 4 years ago

End of year cleaning of old issues. This one should be fixed by current Biopython versions. Use pip install --upgrade biopython in the same virtualenv you installed ncbi-acc-download into, and you should be good to go. Please don't hesitate to comment if the issue still exists after upgrading Biopython.

Anto007 commented 4 years ago

Thank you so much for remembering to follow up with this- much appreciated! I would like to report that although there are no error messages being output now after the biopython upgrade, the command ncbi-acc-download --recursive GHGH00000000.1 merely downloads the master record gbk file and not all of the records that are covered by the master record. Unfortunately, the purpose of having 'recursive' doesn't seem to be served here.

kblin commented 4 years ago

Thanks for testing. I'll have a look at this.

kblin commented 4 years ago

Ah, shoot, it looks like there's still an issue in the Biopython support for this. 😞 We need https://github.com/biopython/biopython/pull/2432 to land and be shipped first. And I think I still need a change in ncbi-acc-download as well.

kblin commented 4 years ago

Ok, another try. Once Biopython releases 1.77, 789a34b4da52c43923ebff47c3141b4468f46892 should have fixed it. Install the 0.2.6 version of ncbi-acc-download I just to get the fix.

Anto007 commented 4 years ago

Many thanks for the update; once biopython 1.77 is out, I'll give a try with ncbi-acc-download v0.2.6