Open joverlee521 opened 2 years ago
@joverlee521
Switch to an endpoint with
xz
.
I did not know it exists. Do you know the URL? Does it have the same data in it?
In the meantime we could try parallel bzip also: https://github.com/nextstrain/ncov-ingest/pull/247
I did not know it exists. Do you know the URL? Does it have the same data in it?
Ah, it does not exist, as far as we know. This would be asking GISAID to switch to xz for us for the current export we get.
Context
On Dec 2, 2021, multiple
fetch-and-ingest
runs for GISAID failed. The failure pattern was we would download for a while and the transfer would get closed before it's completed. Subsequent attempts to fetch would hit a 503 error. We manually triggeredfetch-and-ingest
two more times and saw the same failure pattern.Possible solution
The scheduled run today had no issues, so this may have just been unfortunate timing of our runs being interrupted by GISAID's reboots. We can revisit the following solutions in anticipation of similar future issues:
fetch-from-gisaid
to stop decompression during streaming to lower the open connection time. However, decompressing in a separate step this would increase the total time to runfetch-and-ingest
.xz
, which has better compression ratio and decompression time thanbzip2
. Regardless of errors, this would be a huge improvement for us and dramatically decreasefetch-and-ingest
runtime.