mozilla-releng / balrog

Mozilla's Update Server
http://mozilla-balrog.readthedocs.io/en/latest/index.html
Mozilla Public License 2.0
100 stars 149 forks source link

TimeoutError when uploading full firefox release #3126

Open sentry-io[bot] opened 3 months ago

sentry-io[bot] commented 3 months ago

Sentry Issue: BALROG-STAGE-ADMIN-2Q

TimeoutError: 
(22 additional frame(s) were not displayed)
...
  File "/usr/local/lib/python3.11/site-packages/gcloud/aio/auth/token.py", line 357, in refresh
    resp = await self._refresh_service_account(timeout=timeout)
  File "/usr/local/lib/python3.11/site-packages/gcloud/aio/auth/token.py", line 322, in _refresh_service_account
    resp = await self.session.post(
  File "/usr/local/lib/python3.11/site-packages/gcloud/aio/auth/session.py", line 190, in post
    resp = await self.session.post(
  File "aiohttp/client.py", line 507, in _request
    with timer:
  File "aiohttp/helpers.py", line 735, in __exit__
    raise asyncio.TimeoutError from None
sentry-io[bot] commented 3 months ago

Sentry Issue: BALROG-STAGE-ADMIN-21

sentry-io[bot] commented 3 months ago

Sentry Issue: BALROG-PROD-ADMIN-2T

sentry-io[bot] commented 3 months ago

Sentry Issue: BALROG-PROD-ADMIN-2S

jcristau commented 3 months ago

PUT /api/v2/releases/foo with a full firefox blob as the body fails with a 502 or 504 returned to the client, and on the server side a TimeoutError and OSError in sentry, along with lots of "Giving up acquire_access_token(...) after 5 tries (TimeoutError)", in gcloud auth code.

jcristau commented 3 months ago

This was already happening with v3.45, seeing it today with v3.51 (prod) and v3.52 (stage).

bhearsum commented 3 months ago

I finally managed to dig up https://github.com/mozilla-releng/balrog/pull/1249#issue-585312030, which was where async uploading was added. Most notably there is this comment:

In my local testing, I was able to create a Release with ~550 locales in less than 10 seconds. Without this patch, creating a Release with only 16 locales took around 15 seconds.

There's also an idea about a more ideal solution:

A more ideal solution would probably invole writing entries to the database as part of the http request, and having the Balrog Agent handle the GCS side of things -- but that's a more involved project.

(I looked but I couldn't find a bug report that necessitated this PR.)