Cost of Key, Value abstractions

cijothomas commented 3 months ago

Benchmarks from here shows that the cost of creating abstractions like Key,Value in opentelemetry crate is non-trivial. Take the example below:

c.bench_function("CreateOTelKeyValue", |b| {
        b.iter(|| {
            let _v1 = black_box(KeyValue::new("attribute1", "value1"));
        });
    });

    c.bench_function("CreateTupleKeyValue", |b| {
        b.iter(|| {
            let _v1 = black_box(("attribute1", "value1"));
        });
    });

Its 7.7199 ns vs 786.74 ps (10x!)

It may be very-hard or even impossible to avoid the overhead. Opening an issue to facilitate further discussions. Some options include

Do nothing, and accept this as the cost ("OTel tax"). This is simply listed only, not really an option we can afford to take!
Make Key,Value etc. as Traits with no-op implementation in opentelemetry crate, with actual implementations moved to the sdk crate. This should avoid/reduce perf hit when just opentelemetry is used. Alternatively, this may also be achieved via some feature-flag with no-ops as default feature in API. The SDK enables the feature-flag in the API, so users without SDK enabled will get high perf.
See if we can steal/borrow idea from tracing crate, which uses macros that avoid evaluating the attributes completely, unless enabled. The prior art here is definitely worth exploring further.

I also want to emphasize why is this issue is relevant/important! For an application owner who uses OTel, this may not be a significant concern. But this would be a serious (probably blocker) for a library author to bet on OpenTelemetry as the instrumentation API. This is because, by simply enabling OTel API calls, the library's own performance will degrade, even when nobody is interested in the telemetry (i.e the final app owner does not enable OTel itself). Whether the few nanosecs overhead is significant or not depends on the library's own latency without OTel! If it is in the order of milliseconds, then probably not a big issue, but Rust is used in perf sensitive scenarios where it won't be surprising to find libraries that perform its core operation in nanosecs order.

cijothomas commented 3 months ago

@lalitb has agreed to dig more into where exactly are the overheads arising from, and see if we can bring it to more acceptable levels, before attempting suggestions 2,3 from above or something-else .

cijothomas commented 3 months ago

@lalitb has agreed to dig more into where exactly are the overheads arising from, and see if we can bring it to more acceptable levels, before attempting suggestions 2,3 from above or something-else .

From SIG/Community meeting yesterday: Lalit has confirmed that the perf hit is not due to any accidental copying/cloning or heap allocations. We are not doing these. These are costs added up by having several redirections/enums (eg: Value(enum)->StringValue->OtelString(enum)->str).

open-telemetry / opentelemetry-rust

Cost of Key, Value abstractions #1642