openHPI / poseidon

Scalable task execution orchestrator for CodeOcean
MIT License
8 stars 1 forks source link

Refactor use of Sentry Trace Context #654

Closed mpass99 closed 1 month ago

mpass99 commented 1 month ago

to enhance Sentry's mapping between errors and traces.

For Sentry's current SDK implementation, this PR should not change anything at all. However, we argue that this is conceptually the better usage and might be of importance e.g. for possible concurrency fixes.

Follow-up for #643 This still does not fix the identified concurrency issue. Should we?

In Sentry, we can find broken trace-error-mappings that might be related to this issue [1] [2].

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 79.54545% with 9 lines in your changes missing coverage. Please review.

Project coverage is 76.69%. Comparing base (8b86efa) to head (7664abb). Report is 3 commits behind head on main.

Files Patch % Lines
pkg/nullio/ls2json.go 40.00% 3 Missing :warning:
internal/nomad/api_querier.go 85.71% 2 Missing :warning:
internal/nomad/sentry_debug_writer.go 33.33% 2 Missing :warning:
pkg/logging/sentry_hook.go 80.00% 1 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #654 +/- ## ========================================== + Coverage 76.68% 76.69% +0.01% ========================================== Files 41 41 Lines 3525 3536 +11 ========================================== + Hits 2703 2712 +9 - Misses 609 610 +1 - Partials 213 214 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

MrSerth commented 1 month ago

This still does not fix the identified concurrency issue. Should we?

Yes, sounds good. Can we, by chance, "just" apply the same concept here, too?

hub = sentry.CurrentHub().Clone()
ctx = sentry.SetHubOnContext(ctx, hub)
MrSerth commented 1 month ago

Your changes look good so far. Have you tested them with an example, to see that Sentry shows the information as intended?

mpass99 commented 1 month ago

Thanks for encouraging me to test it more extensively.

On staging, we can see directly an improvement. While out of 3 executions on the main branch 3 had trace grouping issues [1] [2] [3], for this branch none of 3 executions had trace grouping issues [1] [2] [3]. Furthermore, when injecting an error, Sentry correctly linked the error to the respective traces - even for concurrent constellations [1] [2] [3]. The errors are grouped in two different Sentry Issues based on the differing stack trace [1] [2].

In my local environment, I struggle with the Sentry Backend. Most of the time, it does not accept my WebSocket traces independently of the branch (main/this one). Out of 12 Executions Sentry just noticed 1 WebSocket Connection. The execute requests are recognized by Sentry. The Sentry Relay is receiving the respective event without logging any error. 🤷

sentry-io[bot] commented 1 month ago

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

Did you find this useful? React with a 👍 or 👎