mozilla / probe-scraper

Scrape and publish Telemetry probe data from Firefox
https://mozilla.github.io/probe-scraper/
Mozilla Public License 2.0
21 stars 53 forks source link

Retry transient GCS errors #581

Open relud opened 1 year ago

relud commented 1 year ago

https://github.com/mozilla/bedrock/actions/runs/4598182706/jobs/8121878736 https://github.com/mozilla/glean/actions/runs/4609412250/jobs/8146505104?pr=2441

gsutil is failing to download objects that fail with 404 exceptions:

Error: Command ['gsutil', '-q', '-m', 'rsync', '-r', 'gs://probe-scraper-prod-artifacts/glean/', '/tmp/tmpy4y4leon/output/glean'] returned non-zero exit status 1:
NotFoundException: 404 gs://probe-scraper-prod-artifacts/glean/reference-browser/general does not exist.
NotFoundException: 404 gs://probe-scraper-prod-artifacts/glean/reference-browser/pings does not exist.
NotFoundException: 404 gs://probe-scraper-prod-artifacts/glean/reference-browser/tags does not exist.
CommandException: 3 files/objects could not be copied/removed.

the error is transient, because the objects do exist, ~but presumably are temporarily disappearing during upload or something like that.~ edit: but they have been updated since gsutil listed them, and gsutil requests the specific version at time of listing.

we could retry the full gsutil sync on failure, or we could reimplement the gsutil sync in python and retry 404s. the latter option is probably more robust, and should be relatively short.

Dexterp37 commented 1 year ago

@relud should we consider adding in some logging to help understanding the issue first, e.g. what's in GoogleCloudPlatform/gsutil#906 ?

relud commented 1 year ago

we could add the -DD flag:

OPTIONS
  -D          Shows HTTP requests/headers and additional debug info needed
              when posting support requests, including exception stack traces.

              CAUTION: The output from using this flag includes authentication
              credentials. Before including this flag in your command, be sure
              you understand how the command's output is used, and, if
              necessary, remove or redact sensitive information.

  -DD         Same as -D, plus HTTP upstream payload.

but I wouldn't recommend it, as those headers will include auth tokens.

relud commented 1 year ago

That said, I can confirm from running the command locally with -DD that gsutil does request a specific "generation" of objects, so if the file was rewritten between listing the object and downloading the content, I would expect it to 404.