mozilla / probe-scraper

Scrape and publish Telemetry probe data from Firefox
https://mozilla.github.io/probe-scraper/
Mozilla Public License 2.0
21 stars 53 forks source link

500 error on probe-scraper actions #711

Closed perrymcmanis144 closed 5 months ago

perrymcmanis144 commented 5 months ago

I opened a PR against glean and noticed that I had a probe-scraper 500 error. I tried repushing a few minutes later and it still persisted. It appears that some other folks might be experiencing it as well: https://mozilla.slack.com/archives/CEE12R4E8/p1710431615148949?thread_ts=1707921362.947109&cid=CEE12R4E8

Log from the PR: https://github.com/mozilla/glean/actions/runs/8283575535/job/22667074476

travis79 commented 5 months ago

Another failure in another repo: https://github.com/mozilla/experimenter/actions/runs/8283756036/job/22668600610?pr=10409

travis79 commented 5 months ago

Another: https://github.com/mozilla/blurts-server/actions/runs/8283671119/job/22667400123

BenWu commented 5 months ago

Stack trace from cloud function:

 Traceback (most recent call last):
  File "/layers/google.python.pip/pip/lib/python3.10/site-packages/flask/app.py", line 2190, in wsgi_app
    response = self.full_dispatch_request()
  File "/layers/google.python.pip/pip/lib/python3.10/site-packages/flask/app.py", line 1486, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/layers/google.python.pip/pip/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request
    rv = self.dispatch_request()
  File "/layers/google.python.pip/pip/lib/python3.10/site-packages/flask/app.py", line 1469, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/layers/google.python.pip/pip/lib/python3.10/site-packages/functions_framework/__init__.py", line 99, in view_func
    return function(request._get_current_object())
  File "/workspace/probe_scraper/glean_push.py", line 47, in main
    updated_paths = runner.main(
  File "/workspace/probe_scraper/runner.py", line 708, in main
    upload_paths += load_glean_metrics(
  File "/workspace/probe_scraper/runner.py", line 609, in load_glean_metrics
    raise ValueError("Found duplicate Glean metrics, check email for details")
ValueError: Found duplicate Glean metrics, check email for details 
BenWu commented 5 months ago

This is what I found in the logs:

Glean has detected duplicated metric identifiers coming from the product 'mozillavpn-backend-cirrus'.
- 'cirrus_events.enrollment' defined more than once in mozillavpn-backend-cirrus, nimbus-cirrus
- 'cirrus_events.enrollment_status' defined more than once in mozillavpn-backend-cirrus, nimbus-cirrus
- 'cirrus_events.instance_name' defined more than once in mozillavpn-backend-cirrus, nimbus-cirrus

I think an email should have been sent too emails don't get sent for the cloud function

travis79 commented 5 months ago

I think this may be related: https://github.com/mozilla/probe-scraper/pull/708

travis79 commented 5 months ago

And yes, emails were sent out. I got one along with several nimbus folks. I've actually gotten several emails now, stating the same thing.

BenWu commented 5 months ago

I think the issue is that metrics and pings for mozillavpn-backend-cirrus is still "cached" in the gcs bucket even though #708 removed them. I can create a backup and then try deleting it and it should allow things to run again. Certainly a bug somewhere here but I'll just get things working again

BenWu commented 5 months ago

This should now be fixed after whd deleted the metrics file. I verified that runs are succeeding now. Some notes: