Open sentry-io[bot] opened 3 years ago
Some possible thoughts on this: https://medium.com/@philamersune/fixing-ssl-error-decryption-failed-or-bad-record-mac-d668e71a5409
This seems to be to do with starting a connection in one thread, and using it in another.
So looking at the trace above, this isn't coming from our code at all, but from the New Relic introspection of our tasks. This is where the other thread is. So my guess is if we get a reporting period from New Relic while we are trying to update, it explodes.
We'll probably have to do a minor version of what we did in the WebSocket to turn off transaction tracing:
I've tried disabling New Relic transaction monitoring, but we're going to have to wait and see if anything happens.
This is still working with the transaction monitoring disabled, but New Relic is still monkey patching Postgres, so we're going to try disabling it completely and see what happens.
Currently deploying the PR which removes New Relic, just to rule it out of being the source. Unfortunately we aren't going to know for a while / ever, because this is very intermittent. We can go a whole week without seeing this sometimes.
I've deployed this now and closed the Sentry issue, but we're going to have to check on this for at least a week if it doesn't just show up immediately.
@jon-betts Re-opening this because it happened again: https://sentry.io/organizations/hypothesis/issues/2086421688/ (latest event is today just after midnight, 12:01:08.
It looks like New Relic is innocent here, and we should revert https://github.com/hypothesis/checkmate/pull/87 and https://github.com/hypothesis/checkmate/pull/83?
Yep
Sentry Issue: CHECKMATE-6