tokio-rs / tracing

Application level tracing for Rust.
https://tracing.rs
MIT License
5.51k stars 721 forks source link

Document reload layer performance impact #1632

Open bkchr opened 3 years ago

bkchr commented 3 years ago

Hey, we have observed that using the reload layer has quite some performance impact. Without parking_lot(we realized later that this is disabled by default) it takes factor 6x more time when you have RUST_LOG=bullshit=trace set, aka we are required to check if any target matches "bullshit"(which no target matches). With parking lot enabled the performance decrease is still factor 3. We think that it would be nice to document this.

hawkw commented 3 years ago

There is a note in the module-level documentation for reload that it introduces some performance overhead, although it doesn't go into great detail on this: https://docs.rs/tracing-subscriber/0.2.25/tracing_subscriber/reload/index.html, perhaps it should be expanded. Anything you'd like to see mentioned more specifically in the docs?

I'm somewhat surprised that the reload layer makes filtering as much as 3x slower with parking_lot enabled, though. That may be worth investigating...

bkchr commented 3 years ago

I'm somewhat surprised that the reload layer makes filtering as much as 3x slower with parking_lot enabled, though. That may be worth investigating...

There is probably a lot of stuff being printed from different threads. I have "measured" this overhead with wasmtime while it compiled some wasm code and wasmtime is using a lot of threads for it. However, most of the calls are using a read lock as far as I have seen. So, I also don't really understand why it is that slow.

hawkw commented 3 years ago

Hmm, there is currently a microbenchmark for comparing the performance of a reload layer with that of the same subscriber without reloading enabled. However, I noticed that this benchmark currently only tests the single-threaded case.

I wrote a version where events are recorded by multiple threads concurrently, but I still don't see a significant performance difference between the benchmark with reloading and the version without:

reload/complex-collector
                        time:   [1.2370 us 1.2400 us 1.2431 us]

reload/complex-collector-reloadable
                        time:   [1.2078 us 1.2117 us 1.2152 us]

reload_concurrent/complex-collector
                        time:   [8.4684 us 8.5037 us 8.5417 us]

reload_concurrent/complex-collector-reloadable
                        time:   [8.6130 us 8.6478 us 8.6845 us]

I would certainly expect the RwLock to introduce some amount of overhead, but I wouldn't expect it to be nearly so severe when the lock is never contended. I'm assuming you're not constantly reloading the filter very frequently?

If your application spawns an extremely large number of threads, it's possible that the RwLock has a more noticeable impact than in the case where the layer is shared between a smaller number of threads. Potentially, we could use a sharded lock approach to decrease the impact in the read case.

hawkw commented 3 years ago

Per your comment here:

However, most of the calls are using a read lock as far as I have seen. So, I also don't really understand why it is that slow.

I noticed you said "most of the calls are using a read lock". Does that mean that during normal operation of the program, the layer is being reloaded regularly?

The intended use-case for the reload layer is one where the filter configuration is reloaded very infrequently; typically in response to a user-initiated configuration change (e.g. changing settings in a configuration UI, sending the program a SIGHUP to reload a config file, etc) A general rule of thumb is that if reloading a filter is happening through some automated process, you're probably reloading the filter too frequently to get good performance out of reload --- if you're using it to implement dynamic filtering or sampling where a different filter configuration is used for, say, 1 out of every 100 requests, or something, you should probably be using a custom filter implementation instead.

bkchr commented 3 years ago

I noticed you said "most of the calls are using a read lock". Does that mean that during normal operation of the program, the layer is being reloaded regularly?

No, no. The layer is not reloaded regularly. It happens when the user provides some new logging targets, which happens almost never. So, that should be fine.

bkchr commented 3 years ago

Could you try to change this to https://github.com/tokio-rs/tracing/blob/5b49db4c0e797ca90135b051bd3f5101236a924a/tracing-subscriber/benches/reload.rs#L75-L80 to

for _ in 1000 {
   tracing::info!(target: "this", "hi");
   tracing::debug!(target: "enabled", "hi");
   tracing::warn!(target: "hepp", "hi");
   tracing::trace!(target: "stuff", "hi");
}

And change this https://github.com/tokio-rs/tracing/blob/5b49db4c0e797ca90135b051bd3f5101236a924a/tracing-subscriber/benches/reload.rs#L28 to something=trace? The point is to have the min level be trace, but we need to filter by the prefix.