metrics-rs / metrics

A metrics ecosystem for Rust.
MIT License
1.1k stars 151 forks source link

Support thread-local default recorder #502

Open asmello opened 1 month ago

asmello commented 1 month ago

with_local_recorder requires the logic to be inside a closure, which isn't always feasible. Imagine for example running tests: cargo compiles a single binary per target, so if you have multiple tests producing metrics, they will all share the same global recorder. It isn't possible to use a closure unless you combine all your tests into a single one.

The tracing crate provides a solution to this in the form of set_default, which allows you to install a thread-local default runtime. This works for cargo test as each test runs in a separate thread.

tobz commented 1 month ago

with_local_recorder requires the logic to be inside a closure, which isn't always feasible. Imagine for example running tests: cargo compiles a single binary per target, so if you have multiple tests producing metrics, they will all share the same global recorder. It isn't possible to use a closure unless you combine all your tests into a single one.

The tracing crate provides a solution to this in the form of set_default, which allows you to install a thread-local default runtime. This works for cargo test as each test runs in a separate thread.

Can you describe what is difficult about the closure-based approach? That's the salient part here.

asmello commented 1 month ago

I think a big part of the difficulty is bad interaction with Rust async. I'd need to spawn a whole tokio runtime inside the closure to run the test logic as async closures are not stable.

Another part is that I want the metrics endpoint to be served (optionally) as part of a pre-existing axum HTTP server, which means it's most convenient to configure and initialise the prometheus runtime internally in this server's builder, so the caller only needs to deal with configuration (and not have to interact with metrics/prometheus-exporter directly). Ideally I'd be able to tie the lifetime of the reporter to that server instance so I can create another one later and "reset" the metrics.

tobz commented 1 month ago

Yeah, using the local recorder with async code is indeed painful.

I can see where something like tracing::set_default would be easier in tests where you're already running them in an async way using something like #[tokio::test]. Doable if you just replicated the code that it's generating for you, but inside the closure... although again, understandable that this is painful and ugly.

I'd definitely be willing to accept a PR implementing something like tracing::set_default.

As far as the thing about tying the recorder to a preexisting server, resetting it later on, etc... I'm not sure I fully grok the goal and how it relates to this issue, but happy to chat more if you want to spin it off into a separate issue.

asmello commented 1 month ago

Doable if you just replicated the code that it's generating for you, but inside the closure... although again, understandable that this is painful and ugly.

Yes, that's what I ended up doing. It works for testing just fine, just require some uncomfortable contortions.

I'd definitely be willing to accept a PR implementing something like tracing::set_default.

Thanks! I think I can cook up a PR for this.

As far as the thing about tying the recorder to a preexisting server, resetting it later on, etc... I'm not sure I fully grok the goal and how it relates to this issue, but happy to chat more if you want to spin it off into a separate issue

I guess what I'm trying to say there is just that the closure approach requires one to wrap their entire program inside of it, which imposes some tough constraints in how the code is organised. For example, it makes RAII impossible to achieve completely, as any self-contained unit that uses metrics (like a web server) must be constructed and initialised strictly after metrics are set up externally, even if they are the only consumers of it. Ideally if a web server is the only component producing metrics, it should take care of setting it up internally, and clean up after itself when it's done running.

To be clear, none of this is blocking for me, but as you say, it leads to painful and ugly code. And potentially lots of head-scratching until one figures out the right way to do it.