opensafely-core / job-server

A server for mediating jobs that can be run in an OpenSAFELY secure environment. q.v. job-runner
https://jobs.opensafely.org
Other
5 stars 10 forks source link

Investigate database errors not being surfaced to slack #3366

Closed ghickman closed 1 year ago

ghickman commented 1 year ago

We should be surfacing database errors up from jobs, via job-runner, to job-server to push into Slack. We haven't seen any in a while, but we're pretty sure we should have. What's happened to them?

ghickman commented 1 year ago

Thought: we should add a command to job-runner to send a test message so we can test job-runner -> job-server -> slack.

ghickman commented 1 year ago

I've added two views to the staff area so we can manually test error and message in #3372

Now that #3357 is deployed we're getting exceptions in Sentry for python code again, which was part of the problem. However, we're not getting messages from Django views in our job-server project on Sentry.

I've messaged Sentry support with:

Hi,

I'm trying to send a messsage with sentry_sdk.capture_message("testing") in a Django view, but it's not appearing in my project's issues list. The project is job-server under the EBM DataLab org.

Calling the same function from the Django shell works, as you can see here: https://ebm-datalab.sentry.io/issues/4370808790/?project=5443358&query=&statsPeriod=14d&stream_index=0

I'm using django==4.2.4 and sentry-sdk==1.29.2

I've reduced this down to a minimal testcase and can share if that's useful.

Thanks George

My current theory is the events have been merged into another group somehow OR we're unintentionally filtering them on the Sentry side without realising.

Digging into sentry_sdk I can see we get a 200 response from Sentry when the message is sent, and an event ID but that doesn't seem to resolve to anything in our project.

ghickman commented 1 year ago

Copied over from #3386

We use Sentry's capture_message feature to pass errors up from job-runner to Sentry in our API handler.

This is not currently working, with messages of any kind not appearing in the event list. We can send messages from the Django shell, for example, but sending them from a view means they don't show up.

If we use a different project in Sentry then they do show up from a view.

I've reached out to Sentry via their support portal to ask for help as the above suggests it's a configuration issue with our job-server project on Sentry.

ghickman commented 1 year ago

I've sent rougly the same message again directly via their support address (support@sentry.io), and immediately received a confirmation of receipt email, which is progress!

ghickman commented 1 year ago

They responded suggesting that we try:

Locally I tried:

Debug mode was loud but useful to confirm events were being sent. I've left localhost unfiltered in the inbound filters setting as it's useful for testing. Spike protection being on/off didn't seem to make a difference.

We have messages coming through now… but it's still unclear why the ones from production weren't showing up before. The inbound filter accounts for why my local testing was being dropped.