pelias / openaddresses

Pelias import pipeline for OpenAddresses.
MIT License
51 stars 43 forks source link

download: improved error handling for cURL command #488

Closed missinglink closed 2 years ago

missinglink commented 2 years ago

this PR is a proposed solution for https://github.com/pelias/docker/issues/267

closes https://github.com/pelias/docker/issues/267

orangejulius commented 2 years ago

Nice, this is much much better error handling than we had before, and should help people diagnose problems. Definitely better than printing an error from unzip when it tries to read an empty file or a file with an HTTP error message in it 🤣

I think the issue in https://github.com/pelias/docker/issues/267 probably comes from an intermittent error from Bunny, which are common enough that I'm sure lots of people are running into it. In my experience a simple retry can usually handle those issues, so that might be something we look at adding in the future. If someone is downloading lots of small OA files, it might be frustrating for anyone file's failure to kill the whole download process.

missinglink commented 2 years ago

cURL actually has several flags which control retrying on error, these would be trivial to add:

https://curl.se/docs/manpage.html#--retry

missinglink commented 2 years ago

I've rebased this and pushed a new version which makes a couple changes:

diff --git a/utils/download_all.js b/utils/download_all.js
index afe0d65..6842f2b 100644
--- a/utils/download_all.js
+++ b/utils/download_all.js
@@ -46,13 +46,16 @@ function downloadBundle(targetDir, config, sourceUrl, callback) {
           child_process.exec(`aws s3 cp ${sourceUrl} ${tmpZipFile} --only-show-errors ${s3Options}`, callback);
         } else {
           const flags = [
-            '-X GET',                 // HTTP GET
-            '--silent',               // be quiet
-            '--location',             // follow redirects
-            '--fail-with-body',       // exit with a non-zero code for >=400 responses
-            '-w "%{http_code}"',      // print status code to STDOUT
-            `--referer ${referer}`,   // set referer header
-            `-o ${tmpZipFile}`        // set output filepath
+            '--request GET',                // HTTP GET
+            '--silent',                     // be quiet
+            '--location',                   // follow redirects
+            '--fail',                       // exit with a non-zero code for >=400 responses
+            '--write-out "%{http_code}"',   // print status code to STDOUT
+            `--referer ${referer}`,         // set referer header
+            `--output ${tmpZipFile}`,       // set output filepath
+            '--retry 5',                    // retry this number of times before giving up
+            '--retry-connrefused',          // consider ECONNREFUSED as a transient error
+            '--retry-delay 5'               // sleep this many seconds between retry attempts
           ].join(' ');

           // the `--fail*` flags cause an error to be returned as the first arg with `error.code`
@@ -60,13 +63,9 @@ function downloadBundle(targetDir, config, sourceUrl, callback) {
           child_process.exec(`curl ${flags} ${sourceUrl}`, (error, stdout) => {
             if (!error) { return callback(); }

-            const message = `cURL request failed, exit code: ${error.code}, HTTP status: ${stdout}`;
-            console.error(message);
-
-            // the `--fail-with-body` flag informs cURL to still write the HTTP response to disk.
-            // this is usually an HTML document containing more information about the failure which
-            // we can print for debugging purposes.
-            console.error(fs.readFileSync(tmpZipFile, {encoding: 'utf8'}));
+            const message = `cURL request failed, HTTP status: ${stdout}, exit code: ${error.code}`;
+            console.error(`\nError: ${message}`);
+            console.error(`\n${error.cmd}\n`);

             callback(new Error(message));
           });
missinglink commented 2 years ago

most crucially it doesn't attempt the unzip step now, which is great, the retry logic is a cherry on the top 😄

I simulated a 404 error (which actually doesn't retry BTW) and it looks like this on the console:

node utils/download_data.js
2021-09-16T17:13:39.599Z - info: [openaddresses-download] Attempting to download all data
2021-09-16T17:13:39.603Z - debug: [openaddresses-download] downloading https://data.openaddresses.io/openaddr-collected-global.zip

Error: cURL request failed, HTTP status: 404, exit code: 22

curl --request GET --silent --location --fail --write-out "%{http_code}" --referer https://pelias-results.openaddresses.io --output /var/folders/5w/xzyyzb490m36my2j2w4094_m0000gn/T/2021816-19107-63p63f.ybh8c.zip --retry 5 --retry-connrefused --retry-delay 5 https://data.openaddresses.io/openaddr-collected-global.zip.foo

2021-09-16T17:13:40.239Z - error: [openaddresses-download] Failed to download data message=cURL request failed, HTTP status: 404, exit code: 22, stack=Error: cURL request failed, HTTP status: 404, exit code: 22
    at /Users/peter/code/pelias/openaddresses/utils/download_all.js:70:22
    at ChildProcess.exithandler (node:child_process:404:5)
    at ChildProcess.emit (node:events:394:28)
    at maybeClose (node:internal/child_process:1067:16)
    at Socket.<anonymous> (node:internal/child_process:453:11)
    at Socket.emit (node:events:394:28)
    at Pipe.<anonymous> (node:net:662:12)
missinglink commented 2 years ago

Tested locally and still works fine for regular downloads, opening up for review, testing and merging.

orangejulius commented 2 years ago

Very nice, curl just has everything. I'm testing out a download of the North America docker project, it downloads quite a few files, so it should be a good test case.

missinglink commented 2 years ago

Turns out it required duplication of my code for the individual downloads vs. the global ones.

I've now tested both scenarios on my local machine (MacOS) running the scripts directly and also via docker from a pelias/docker project and they both work great.

orangejulius commented 2 years ago

Cool yeah, here's what it looks like when there's an intermittent error. Much better experience:

Screen Shot 2021-09-16 at 2 00 11 PM
missinglink commented 2 years ago

Hmmm yeah it's a much improved user-experience but... what's up with Banff? I assume it's stalling for (5*5=25s) before timing out?

missinglink commented 2 years ago

I chose Germany for my testing and I found one 404, but that was due to "No latest run found".

warn: [openaddresses-download] failed to download https://results.openaddresses.io/latest/run/de/he/city_of_kassel.zip: Error: cURL request failed, HTTP status: 404, exit code: 22
curl -L https://results.openaddresses.io/latest/run/de/he/city_of_kassel.zip
No latest run found%
missinglink commented 2 years ago

What do you get when trying to download it directly?

curl -L https://results.openaddresses.io/latest/run/ca/ab/city_of_banff.zip
<html><head><title>403 Forbidden</title><link href='//fonts.googleapis.com/css?family=Rubik:300,400,500' rel='stylesheet' type='text/css'><style>html, body { width: 100%; margin: 0; padding: 0; text-align: center; font-family: 'Rubik'; background-image: url('data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPHN2ZyB3aWR0aD0iMjg4MHB4IiBoZWlnaHQ9IjE0MjRweCIgdmlld0JveD0iMCAwIDI4ODAgMTQyNCIgdmVyc2lvbj0iMS4xIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIj4KICAgIDxkZWZzPgogICAgICAgIDxyYWRpYWxHcmFkaWVudCBjeD0iNDguNDU0MDQyMiUiIGN5PSIyNy4wMTE5NjQ1JSIgZng9IjQ4LjQ1NDA0MjIlIiBmeT0iMjcuMDExOTY0NSUiIHI9IjcwLjg3MDg1MTQlIiBncmFkaWVudFRyYW5zZm9ybT0idHJhbnNsYXRlKDAuNDg0NTQwLDAuMjcwMTIwKSxzY2FsZSgwLjQ5NDQ0NCwxKSxyb3RhdGUoOTApLHRyYW5zbGF0ZSgtMC40ODQ1NDAsLTAuMjcwMTIwKSIgaWQ9InJhZGlhbEdyYWRpZW50LTEiPgogICAgICAgICAgICA8c3RvcCBzdG9wLWNvbG9yPSIjMDAyNjQ5IiBvZmZzZXQ9IjAlIj48L3N0b3A+CiAgICAgICAgICAgIDxzdG9wIHN0b3AtY29sb3I9IiMwNTFGMzciIG9mZnNldD0iMTAwJSI+PC9zdG9wPgogICAgICAgIDwvcmFkaWFsR3JhZGllbnQ+CiAgICAgICAgPHJlY3QgaWQ9InBhdGgtMiIgeD0iMCIgeT0iMCIgd2lkdGg9IjI4ODAiIGhlaWdodD0iMTQyNCI+PC9yZWN0PgogICAgPC9kZWZzPgogICAgPGcgaWQ9IlBhZ2UtMSIgc3Ryb2tlPSJub25lIiBzdHJva2Utd2lkdGg9IjEiIGZpbGw9Im5vbmUiIGZpbGwtcnVsZT0iZXZlbm9kZCI+CiAgICAgICAgPGcgaWQ9IkhvbWVwYWdlLUNvcHkiIHRyYW5zZm9ybT0idHJhbnNsYXRlKDAsIC01Mjk1KSI+CiAgICAgICAgICAgIDxnIGlkPSJHcm91cC0xNyIgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoMCwgNTI5NSkiPgogICAgICAgICAgICAgICAgPGcgaWQ9Ikdyb3VwLTM0Ij4KICAgICAgICAgICAgICAgICAgICA8bWFzayBpZD0ibWFzay0zIiBmaWxsPSJ3aGl0ZSI+CiAgICAgICAgICAgICAgICAgICAgICAgIDx1c2UgeGxpbms6aHJlZj0iI3BhdGgtMiI+PC91c2U+CiAgICAgICAgICAgICAgICAgICAgPC9tYXNrPgogICAgICAgICAgICAgICAgICAgIDx1c2UgaWQ9Ik1hc2siIGZpbGw9InVybCgjcmFkaWFsR3JhZGllbnQtMSkiIHhsaW5rOmhyZWY9IiNwYXRoLTIiPjwvdXNlPgogICAgICAgICAgICAgICAgPC9nPgogICAgICAgICAgICA8L2c+CiAgICAgICAgPC9nPgogICAgPC9nPgo8L3N2Zz4='); background-repeat: no-repeat; background-position: bottom center; background-size: cover; color: white; height: 100%; background-color: #051f37; } h1 {margin-bottom: 0px;font-weight: bold;font-size: 140px;font-weight: 500;padding-top: 130px;margin-bottom: -35px;}h2 {font-size: 45px;color: white; font-weight: 200;}</style></head><body><div id='content'><h1 style='margin-bottom: -35px;'>403</h1><h2>Forbidden</h2></div></body></html>%
missinglink commented 2 years ago

agh, by bad, I forgot the referer:

curl -L --referer https://pelias-results.openaddresses.io -O https://results.openaddresses.io/latest/run/ca/ab/city_of_banff.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   339  100   339    0     0     75      0  0:00:04  0:00:04 --:--:--    82
100 39595  100 39595    0     0   7692      0  0:00:05  0:00:05 --:--:--  7692

unzip -t city_of_banff.zip
Archive:  city_of_banff.zip
    testing: README.txt               OK
    testing: ca/ab/city_of_banff.vrt   OK
    testing: ca/ab/city_of_banff.csv   OK
No errors detected in compressed data of city_of_banff.zip.
orangejulius commented 2 years ago

Yeah, I ran the download again and there were no problems with Banff that time. The 503 error from Bunny CDN actually said "over capacity", and it was there for a while. So the retry strategy may not be enough to work around all issues with the CDN.

missinglink commented 2 years ago

I would consider changing the retry values if they are inadequate.

I chose 5 attempts at 5 seconds pause as it seemed reasonable that an intermittent error would resolve within 25s and it's also not a long time to wait for a failure.

My testing seemed to show that a 404 (for instance) doesn't trigger a retry (which is good), so maybe we increase either the interval, the max retry count or both?

cromulus commented 2 years ago

Does the CDN suggest a retry-after in the response header?