tseemann / mlst

:id: Scan contig files against PubMLST typing schemes
GNU General Public License v2.0
201 stars 47 forks source link

DB download fails with "ERROR: Download didn't fully complete!" #133

Open marrip opened 1 year ago

marrip commented 1 year ago

Hey,

I am using MLST in a pipeline and the lab wants to update the database each time we are running the pipeline on a new batch of samples. Until recently (August 2023) I could just run the mlst-download_pub_mlst to retrieve the newest data and use it directly. However, yesterday I encountered ERROR: Download didn't fully complete! in the logs:

...
  145 scheme: vcholerae version 2
  146 scheme: vparahaemolyticus version 1
  147 scheme: vibrio version 1
  148 scheme: vtapetis version 1
  149 scheme: vvulnificus version 1
  150 scheme: wolbachia version 1
  151 scheme: xfastidiosa version 1
  152 scheme: ypseudotuberculosis_achtman version 3
  153 scheme: yruckeri version 1
  Deleting schemes specified by -x
  Ignoring: afumigatus
  WARNING: Can't ignore non-existent scheme: blastocystis
  Ignoring: calbicans
  Ignoring: cglabrata
  Ignoring: ckrusei
  Ignoring: csinensis
  Ignoring: ctropicalis
  Ignoring: kseptempunctata
  Ignoring: sparasitica
  Ignoring: tvaginalis
  Collated 144 schemes
  Writing download commands: /usr/local/db/pubmlst/dbases.sh
  Downloading 1 files at a time...
  ERROR: Download didn't fully complete!

The error code was 255. The complete task took about 40 min (which is expected) but then suddenly the error is thrown. I did not figure out yet where it is thrown and why. Is there any chance to optimize the download script to throw the error as early as possible and pinpoint which query caused it. Thank you so much!

marrip commented 1 year ago

I did some detective work 🔎 and found that the sequences from https://rest.pubmlst.org/db/pubmlst_ecoli_achtman_seqdef/loci/fumC/alleles_fasta were missing. Maybe because this file is a bit "bigger" (921 KB, the biggest file of the whole bunch). Downloading it manually with curl did not cause an error. Maybe a retry and back-off time could solve this easily.