About the performance overhead of attribute deduplication in recordingSpan#snapshot

moonspirit commented 7 months ago

Problem Statement

I updated from 1.21 to 1.24, just found i can get noticeable performance improvement for setAttributes memory alloc, trace context inject/extract and other aspects.

But after profiling my rpc framework, i have some thoughts about deduplicating attributes in recordingSpan#snapshot.

To support Tail Sampling（sampling errors） , we have to sample all spans with RecordAndSample or RecordOnly, that means we need to store attributes for all spans, that makes recordingSpan#snapshot being a critical path.

here is a profiling frame graph which enables tail sampling and set sample fraction to 1/1024 (expect to be less overhead)

the profile show that The current cost of this part(attribute deduplication even all my attributes are unique, no duplications) is about the same as that of propagation.compositeTextMapPropagator.Extract.

I would expect this processing of snapshots can be optimized.

Proposed Solution

I would prefer to delay attributes deduplication when we decided to record and sample that span, that means we could delay the operation to SpanProcessor

Alternatives

Or provide an option not to deduplication attributes

dmathieu commented 7 months ago

Do you have the same flame graph running on 1.21 that would show the difference between both versions? (maybe with 1.22 too?)

moonspirit commented 7 months ago

Do you have the same flame graph running on 1.21 that would show the difference between both versions? (maybe with 1.22 too?)

Hi, dmathieu, Here is the different graph for the same benchmark

otel v1.21

otel v1.24

the code is here https://github.com/moonspirit/grpc-tracing-bench (grpc has bad performance for metadata.FromIncomingContext or peer.Peer.Addr.String())

open-telemetry / opentelemetry-go