spandex-project / spandex_datadog

A datadog adapter for the `spandex` library.
MIT License
59 stars 42 forks source link

Flush remaining traces on shutdown #34

Closed SophisticaSean closed 1 year ago

SophisticaSean commented 3 years ago

I also implemented Mox here because the current testing harness in test/api_server_test.exs can't test this shutdown case since it relies on Process.get to retrieve state from the GenServer to confirm state changes. I'm happy to rewrite those tests to use Mox instead which is a lot more simple and robust.

The dependency shouldn't be an issue because Mox is a test only dep.

SophisticaSean commented 3 years ago

Fixes #35

SophisticaSean commented 3 years ago

I've manually tested this in production and the traces are flushed correctly when you stop the root application supervision tree with Application.stop(:yourapp)

SophisticaSean commented 3 years ago

bump. been using this in production for a couple months now and its working great.

SophisticaSean commented 3 years ago

bump again, this is working great for me.

GregMefford commented 3 years ago

Hey there! Sorry this has been sitting around for so long - I'm taking a look at it and I'll work on getting it merged.

SophisticaSean commented 3 years ago

@GregMefford let me know what I need to do to get this merged! I've been using this in prod now since feb and it works great.

SophisticaSean commented 3 years ago

I'll get the merge conflicts resolved

SophisticaSean commented 3 years ago

Fixed merge conflicts!

SophisticaSean commented 3 years ago

@GregMefford could this make it to 1.2.0 pretty please? It's been sitting around here for a while haha

SophisticaSean commented 3 years ago

Fixed the merge conflicts again

GregMefford commented 3 years ago

Hey, thanks for all of your effort on this! ❤️

I am struggling with concerns about making this the default behavior, given that so many people are already using Spandex in production on high-volume systems where they may not expect or want to flush whatever traces may be waiting when the system is shutting down (e.g. if the system is overloaded and not keeping up, you don't want to give it extra work to do that isn't strictly necessary. Trapping exits, especially when it's to do non-trivial amounts of work, can be dangerous because it interferes with the normal mechanisms people expect their supervision trees to use.

I'm wondering if we could meet this need by adding a public function that could be called to synchronously flush traces, so that if you want that behavior, you can just arrange to call that function somewhere in the termination process of your application's supervision tree, rather than changing that behavior automatically for all current users of Spandex.

GregMefford commented 3 years ago

Sorry, I forgot it was you that was discussing this same thing in #26. Do you think that it would be sufficient to add that as a public API, which would not just block until the process mailbox is flushed, but actually flush the remaining traces in the buffer to Datadog?

SophisticaSean commented 2 years ago

@GregMefford I experimented with that originally but it didn't satisfy my use case. What about a default off option to trap the exit and flush the traces that can be enabled with a config option? This way users don't need extra code to ensure traces are flushed but just a 1 line of configuration.

GregMefford commented 2 years ago

Yeah, I think that would be a good way to go, if it's a compile-time config that just determines whether to trap exits or not, then I believe the rest of the code should be safe - it just wouldn't get used unless the trap exits flag was set. 👍

SophisticaSean commented 2 years ago

@GregMefford I finally got around to updating this, let me know what you think! :)

SophisticaSean commented 2 years ago

Bump! Just watched a talk at Empex in SLC where the speaker was talking about dropped traces when the server shuts down so it'd be awesome to get this merged upstream! @GregMefford