Closed corneliusroemer closed 5 months ago
Thanks @corneliusroemer, we are continuing to look into this. Would you mind updating to 16.16.0
and if the problem persists, please include --debug
and report the phid
. This will help us to better understand what went wrong.
Best, Eric
I am also seeing this error in our automated pipelines for zika, mpox, measles, and dengue, which are all scheduled to run at 9AM PDT. If I rerun the workflow at a later time, the error goes away. Does the time coincide with the datasets updates?
@ericcox1 Yes, getting the error with 16.16.0 as well. An example run is: Ncbi-Phid: 1D715361FD2DDA414583C0181D715361FD2DDA414583C018
(it might be that this exact run happened to work, I can't tell as having run --debug
my terminal got flooded with binary text). I'll try to provoke an error again.
Is it possible that some part of the server struggles with the number of requests it's getting? As part of a project, I'm doing dataset downloads via CLI for a few taxa around every 3 minutes (it's run as part of CI). It's done with API key and the allowed rate is 10 requests per second so we should be far away from that limit but it might still be that no one else hitherto has sent requests so frequently.
I've been getting the same error (Error: Internal error (invalid zip archive). Please try again) repeatedly for the past several days while trying to get influenza A genomes with this command:
datasets download virus genome taxon 11320 --include genome,biosample --debug >& datasets.log
Here is the gzipped --debug output: datasets.log.gz
The download proceeds for a varying amount of time (~two to 39 minutes) and downloads a varying amount of data (haven't kept track but noticed different numbers of GB) before exiting with the error.
I'm using datasets version: 16.17.0
Earlier today, this command succeeded for me:
datasets download virus genome taxon "Alphainfluenzavirus influenzae" --filename all_alphainfluenza.zip
-- it's the first example command on https://www.ncbi.nlm.nih.gov/datasets/docs/v2/how-tos/virus/get-influenza-genomes/ . In 87 minutes, it downloaded a 555MB (530MiB) file that includes data_report.jsonl and genome.fna, but not biosample.jsonl.
Unfortunately the command above with --include genome,biosample
has failed twice this afternoon, both times making it to 67.3MB before getting the invalid zip archive error.
@AngieHinrichs,
Can you run this again with the --debug
flag and send us the PHID? - thanks!
OK, I am kicking off this command (there's no --no-progress-bar
option, so adding a grep -v) and will send PHID and log. Thanks!
time datasets download virus genome taxon 11320 --include genome,biosample --debug |& grep -v ^$'\033' > datasets.log
OK, PHID is 2F4065564DC261B8F1FA965F. Log attached. datasets.2024-05-24.log.gz
Hi AngieHinrichs,
We need to take a deeper look at the issue. We'll post her when we have a fix.
Nuala
Thanks @olearyna!
Hi,
Any good news on this? I had the same error since Monday, I though it was something wrong with my code until I read this post.
Hi carolinasisco,
We are actively working on a fix and aim to have it released within the week. We apologize for any inconvenience this may have caused. Thanks for the patience!
Nuala
Hi carolinasisco and AngieHinrichs,
We have released a fix in the latest version (v16.18.1) of the command line tool that we believe addresses the reported issues. Please test this update and let us know if you encounter any further errors.
Thanks Nuala
Thanks @olearyna, I'll try it out right away!
It worked and it was much faster than before! Thanks again!
Great! I'll close this issue.
Hi, it did not worked for me, any suggestions? Got the same error
Thanks so much @olearyna and @ericcox1! I just upgraded to 16.18.1 and the first run is optimistic, none of the 4 taxon downloads failed. 🎉
I will comment as soon as I see failures again.
@carolinasisco are you sure you're using version 16.18.1?
I think it would help the devs if you could run with --debug then and share the PHID 😀
Hi @carolinasisco,
Yes, if you are still having issues with the latest version can you run --debug
and share the PHID. Thanks for the suggestion corneliusroemer!
Hi @olearyna
I updated through conda --update, the version showing is 16.18.1, This is my code (I ran it with --debug as suggested):
datasets download gene accession --inputfile ~/Desktop/wp_1_50 --filename wp150 --include gene,protein --debug The error is:
Error: Download error: http2: server sent GOAWAY and closed the connection; LastDownloading: ncbi_dataset.zip 4.62MB error Find attached the screen capture with the phid.
Thanks!
Hi carolinasisco,
Thanks for the information! I think this is a separate issue from the virus genome
download. We'll look into it tomorrow.
Nuala
Hi, thank you. I'm trying to download a large set of sequences (nt and aa) from pseudomonas.
Hi, I would like to add another example of this error, in hopes of it being helpful in finding a solution. I am using ncbi datasets version 16.31.0. I was trying to download Streptococcus genomic sequences using the following command:
datasets download genome taxon Streptococcus --include genome,gbff --reference
This results in the following outcome:
Collecting 125 genome records [================================================] 100% 125/125 Downloading: ncbi_dataset.zip 273MB done Validating package files [==>---------------------------------------------] 9% 23/254 Error: Internal error (invalid zip archive). Please try again
On several attempts, the validation of the package files reaches 6 - 9 %.
I reran the command while including either genomes or gbff. When downloading genomes only (--include genome
), the process finished successfully. When downloading gbff only (--include gbff
) the process failed with the same Internal Error as mentioned above.
Hi @mverce,
Thanks for your report.
I wasn't able to reproduce this error and we think you may have encountered a temporary problem.
If you don't mind trying this one more time, please add the --debug
flag and report the Ncbi-phid
value here so we can investigate further.
datasets download genome taxon Streptococcus --include gbff --reference --filename strep.zip --debug
Best, Eric
Hi @ericcox1,
I have tried it again with the commands that were problematic yesterday, as well as with your exact command (incl. --filename strep.zip), but the problem persists. The last Ncbi-Phid from the debug output is: 1CA6C01E4134F3592F685054.6.1
Thanks and best regards, Marko
I tried the same command as Eric listed and can't reproduce
Sadly the issue is still active, at least for taxons ebola-zaire and mpox.
See #356
Originally posted by @corneliusroemer in https://github.com/ncbi/datasets/issues/356#issuecomment-2111024211