Open cijothomas opened 3 months ago
@lalitb has agreed to dig more into where exactly are the overheads arising from, and see if we can bring it to more acceptable levels, before attempting suggestions 2,3 from above or something-else .
@lalitb has agreed to dig more into where exactly are the overheads arising from, and see if we can bring it to more acceptable levels, before attempting suggestions 2,3 from above or something-else .
From SIG/Community meeting yesterday: Lalit has confirmed that the perf hit is not due to any accidental copying/cloning or heap allocations. We are not doing these. These are costs added up by having several redirections/enums (eg: Value(enum)->StringValue->OtelString(enum)->str).
Benchmarks from here shows that the cost of creating abstractions like Key,Value in
opentelemetry
crate is non-trivial. Take the example below:Its 7.7199 ns vs 786.74 ps (10x!)
It may be very-hard or even impossible to avoid the overhead. Opening an issue to facilitate further discussions. Some options include
opentelemetry
crate, with actual implementations moved to the sdk crate. This should avoid/reduce perf hit when justopentelemetry
is used. Alternatively, this may also be achieved via some feature-flag with no-ops as default feature in API. The SDK enables the feature-flag in the API, so users without SDK enabled will get high perf.tracing
crate, which uses macros that avoid evaluating the attributes completely, unless enabled. The prior art here is definitely worth exploring further.I also want to emphasize why is this issue is relevant/important! For an application owner who uses OTel, this may not be a significant concern. But this would be a serious (probably blocker) for a library author to bet on OpenTelemetry as the instrumentation API. This is because, by simply enabling OTel API calls, the library's own performance will degrade, even when nobody is interested in the telemetry (i.e the final app owner does not enable OTel itself). Whether the few nanosecs overhead is significant or not depends on the library's own latency without OTel! If it is in the order of milliseconds, then probably not a big issue, but Rust is used in perf sensitive scenarios where it won't be surprising to find libraries that perform its core operation in nanosecs order.