Open bkchr opened 3 years ago
There is a note in the module-level documentation for reload
that it introduces some performance overhead, although it doesn't go into great detail on this: https://docs.rs/tracing-subscriber/0.2.25/tracing_subscriber/reload/index.html, perhaps it should be expanded. Anything you'd like to see mentioned more specifically in the docs?
I'm somewhat surprised that the reload layer makes filtering as much as 3x slower with parking_lot
enabled, though. That may be worth investigating...
I'm somewhat surprised that the reload layer makes filtering as much as 3x slower with
parking_lot
enabled, though. That may be worth investigating...
There is probably a lot of stuff being printed from different threads. I have "measured" this overhead with wasmtime while it compiled some wasm code and wasmtime is using a lot of threads for it. However, most of the calls are using a read lock as far as I have seen. So, I also don't really understand why it is that slow.
Hmm, there is currently a microbenchmark for comparing the performance of a reload
layer with that of the same subscriber without reloading enabled. However, I noticed that this benchmark currently only tests the single-threaded case.
I wrote a version where events are recorded by multiple threads concurrently, but I still don't see a significant performance difference between the benchmark with reloading and the version without:
reload/complex-collector
time: [1.2370 us 1.2400 us 1.2431 us]
reload/complex-collector-reloadable
time: [1.2078 us 1.2117 us 1.2152 us]
reload_concurrent/complex-collector
time: [8.4684 us 8.5037 us 8.5417 us]
reload_concurrent/complex-collector-reloadable
time: [8.6130 us 8.6478 us 8.6845 us]
I would certainly expect the RwLock
to introduce some amount of overhead, but I wouldn't expect it to be nearly so severe when the lock is never contended. I'm assuming you're not constantly reloading the filter very frequently?
If your application spawns an extremely large number of threads, it's possible that the RwLock
has a more noticeable impact than in the case where the layer is shared between a smaller number of threads. Potentially, we could use a sharded lock approach to decrease the impact in the read case.
Per your comment here:
However, most of the calls are using a read lock as far as I have seen. So, I also don't really understand why it is that slow.
I noticed you said "most of the calls are using a read lock". Does that mean that during normal operation of the program, the layer is being reloaded regularly?
The intended use-case for the reload
layer is one where the filter configuration is reloaded very infrequently; typically in response to a user-initiated configuration change (e.g. changing settings in a configuration UI, sending the program a SIGHUP
to reload a config file, etc) A general rule of thumb is that if reloading a filter is happening through some automated process, you're probably reloading the filter too frequently to get good performance out of reload
--- if you're using it to implement dynamic filtering or sampling where a different filter configuration is used for, say, 1 out of every 100 requests, or something, you should probably be using a custom filter implementation instead.
I noticed you said "most of the calls are using a read lock". Does that mean that during normal operation of the program, the layer is being reloaded regularly?
No, no. The layer is not reloaded regularly. It happens when the user provides some new logging targets, which happens almost never. So, that should be fine.
Could you try to change this to https://github.com/tokio-rs/tracing/blob/5b49db4c0e797ca90135b051bd3f5101236a924a/tracing-subscriber/benches/reload.rs#L75-L80 to
for _ in 1000 {
tracing::info!(target: "this", "hi");
tracing::debug!(target: "enabled", "hi");
tracing::warn!(target: "hepp", "hi");
tracing::trace!(target: "stuff", "hi");
}
And change this https://github.com/tokio-rs/tracing/blob/5b49db4c0e797ca90135b051bd3f5101236a924a/tracing-subscriber/benches/reload.rs#L28 to something=trace
? The point is to have the min level be trace
, but we need to filter by the prefix.
Hey, we have observed that using the reload layer has quite some performance impact. Without
parking_lot
(we realized later that this is disabled by default) it takes factor 6x more time when you haveRUST_LOG=bullshit=trace
set, aka we are required to check if any target matches "bullshit"(which no target matches). With parking lot enabled the performance decrease is still factor 3. We think that it would be nice to document this.