Closed gsnedders closed 4 days ago
Checking the log, I can confirm that /api/checks/github-actions/
was called with run_id=11039052681
and returned 200. Also I see the logging
Adding %!s(*string=0xc0002990a0) for web-platform-tests/wpt run 11039052681 to upload...
from this line, but it didn't log properly:
log.Infof("Adding %s for %s/%s run %v to upload...", artifact.Name, owner, repo, runID)
It should be *artifact.Name
instead.
If I have to guess, it could be an issue with the uploader. Is there a link that I can inspect the wpt results from the artifacts?
@KyleJu https://github.com/web-platform-tests/wpt/actions/runs/11039052681 has downloads links for all the artifacts.
I guess I'm more interested in any logging from the request we issue for https://wpt.fyi/api/results/create? Because that's presumably working too and giving 200 back.
https://github.com/web-platform-tests/wpt.fyi/blob/a24fc662deecac9bab58d5e6b0a240bff3a3cc17/api/receiver/create_run.go#L115-L117 is one case where we could silently fail?
But it seems like it should presumably be managing to schedule the task, it's just for whatever reason it never completes.
I guess also logging from /api/results/process
?
Because something somewhere is clearly failing.
Confirmed that /api/results/create
has no error log around the time this run was created.
Looking at /api/checks/github-actions/
again, I noticed successful logging for 11039052681:
but 10 minutes before that, it also logged failures:
But it seems like it should presumably be managing to schedule the task, it's just for whatever reason it never completes.
I would agree with this. A job could be scheduled and returned 200, but the task might be failing in the Task Queue. There seems to be lots failures around the same timestamps:
but 10 minutes before that, it also logged failures:
Can you find what run ID this was? I wonder if that's the point at which I tried to submit all the Safari runs, but I can't find any that gave 400 all the way back to the initial request, which I would've expected?
Is there any way to find any logging from /api/results/process
(i.e., from the results-processor) when the task runs? Because it's kinda surprising they're failing in the results processor.
but 10 minutes before that, it also logged failures:
Can you find what run ID this was? I wonder if that's the point at which I tried to submit all the Safari runs, but I can't find any that gave 400 all the way back to the initial request, which I would've expected?
I cannot find any logging for run_id and the payload is empty as well, but the user agent of that request is curl. Maybe that rings a bell?
After digging through logs in results-precossor, I found these around the same timestamp (note that the severity level isn't labelled properly in processor):
e.g.
ERROR:processor:Failed to fetch (403): https://api.github.com/repos/web-platform-tests/wpt/actions/artifacts/1978878338/zip
in the url (https://github.com/web-platform-tests/wpt/actions/runs/11039052681) you showed me above, it should be
https://github.com/web-platform-tests/wpt/actions/runs/11039052681/artifacts/1978878338
Would that be the issue?
It'll be the difference between what it gives as the URL via the API versus via the web interface; I guess we're getting 403 (Forbidden) because we're not passing a GitHub API token.
We'd have to notice that we're fetching api.github.com
from the results-processor
and then fetch the GitHub token from the datastore.
That said, the even more significant failing here is that they aren't showing up as invalid runs.
From https://cloud.google.com/appengine/docs/standard/services/taskqueue/push/retrying-tasks?tab=python:
If a handler fails to execute a task (and therefore returns any HTTP status code outside of the range 200–299), App Engine retries the task until it succeeds.
So are we just retrying these indefinitely?
Because I don't think we're catching the error anywhere, so it should just bubble up to the Flask endpoint and then Flask will turn it into a 500 Internal Server Error.
Reading the code, I think we're not actually retrying here:
We should just be setting the status as EMPTY, and manually hitting https://wpt.fyi/api/status/empty confirms this.
OK, even after #4025, https://staging.wpt.fyi/status is showing all the runs stuck as pending. Can someone take a look at the logs for the staging instance?
Ah, after #4025 we're getting:
Traceback (most recent call last):
File "/env/lib/python3.9/site-packages/flask/app.py", line 1473, in wsgi_app
response = self.full_dispatch_request()
File "/env/lib/python3.9/site-packages/flask/app.py", line 882, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/env/lib/python3.9/site-packages/flask/app.py", line 880, in full_dispatch_request
rv = self.dispatch_request()
File "/env/lib/python3.9/site-packages/flask/app.py", line 865, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
File "/app/main.py", line 69, in decorated_func
return func(*args, **kwargs)
File "/app/main.py", line 52, in decorated_func
return func(*args, **kwargs)
File "/app/main.py", line 114, in task_handler
resp = processor.process_report(task_id, flask.request.form)
File "/app/processor.py", line 357, in process_report
p.download(results, screenshots, archives)
File "/app/processor.py", line 211, in download
self._download_archive(archive_url)
File "/app/processor.py", line 185, in _download_archive
artifact = self._download_http(archive_url)
File "/app/processor.py", line 142, in _download_http
'Authorization': 'Bearer ' + self._github_token,
File "/app/processor.py", line 125, in _github_token
return self._secret('github-wpt-fyi-bot-token')
File "/app/processor.py", line 121, in _secret
return self.datastore.get(key)['secret']
KeyError: 'secret'
…which is surprising given that's what wpt.fyi itself is reading. So somehow we don't have the secret accessible?
Ah, after #4025 we're getting:
Traceback (most recent call last): File "/env/lib/python3.9/site-packages/flask/app.py", line 1473, in wsgi_app response = self.full_dispatch_request() File "/env/lib/python3.9/site-packages/flask/app.py", line 882, in full_dispatch_request rv = self.handle_user_exception(e) File "/env/lib/python3.9/site-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "/env/lib/python3.9/site-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] File "/app/main.py", line 69, in decorated_func return func(*args, **kwargs) File "/app/main.py", line 52, in decorated_func return func(*args, **kwargs) File "/app/main.py", line 114, in task_handler resp = processor.process_report(task_id, flask.request.form) File "/app/processor.py", line 357, in process_report p.download(results, screenshots, archives) File "/app/processor.py", line 211, in download self._download_archive(archive_url) File "/app/processor.py", line 185, in _download_archive artifact = self._download_http(archive_url) File "/app/processor.py", line 142, in _download_http 'Authorization': 'Bearer ' + self._github_token, File "/app/processor.py", line 125, in _github_token return self._secret('github-wpt-fyi-bot-token') File "/app/processor.py", line 121, in _secret return self.datastore.get(key)['secret'] KeyError: 'secret'
…which is surprising given that's what wpt.fyi itself is reading. So somehow we don't have the secret accessible?
https://github.com/web-platform-tests/wpt.fyi/pull/4057. I think this must be the cause
We seem to be downloading a bunch of artifacts, with no errors (because otherwise we'd get an ERROR logged from requests
), and then still hitting "ERROR:processor:No results successfully downloaded": downloaded-logs-20241015-141730.json.
The capitalization fix is now deployed to production.
https://staging.wpt.fyi/results/?run_id=6174988006588416 successfully made it up, after #4061 was fixed (though still only on staging).
That said, it does seem to be not labelled with master
, which is somewhat problematic.
https://staging.wpt.fyi/results/?run_id=6174988006588416 successfully made it up, after #4061 was fixed (though still only on staging).
Let's just treat #4063 as a separate follow-up and resolve this. 🎉
From https://github.com/web-platform-tests/wpt/issues/48223#issuecomment-2375384661:
This seems surprising that it succeeded to upload, and yet the upload doesn't appear as pending, invalid, or show up as complete.
We seem to have logging going into Google Cloud Logging, but I don't have any access to that.
cc @past @DanielRyanSmith @jcscottiii @KyleJu