We have a service that reads from an in memory data store and writes to Kafka. I tried adding opentelemetry tracing support to it. What we saw was that GC doubled and throughput declined by 40%. Even more concerning, this is still true if 0% of traces are sampled - even just using the NoopTracer, we see these throughput declines, it's largely coming from the overhead of allocating additional memory for the spans and attributes that are discarded. Profiling data indicated that the additional time and additional allocations were coming from creating the span objects.
I tried searching around for performance advice, and the one thing I found was "make sure you are using batching," and we are using the batch span processor.
I'm curious if anyone's had a similar experience and what you've done to reduce the overhead of adding tracing - reusing span objects or any other tricks. It might be good to have a section on performance optimizations that people could refer to.
Description
We have a service that reads from an in memory data store and writes to Kafka. I tried adding opentelemetry tracing support to it. What we saw was that GC doubled and throughput declined by 40%. Even more concerning, this is still true if 0% of traces are sampled - even just using the NoopTracer, we see these throughput declines, it's largely coming from the overhead of allocating additional memory for the spans and attributes that are discarded. Profiling data indicated that the additional time and additional allocations were coming from creating the span objects.
I tried searching around for performance advice, and the one thing I found was "make sure you are using batching," and we are using the batch span processor.
I'm curious if anyone's had a similar experience and what you've done to reduce the overhead of adding tracing - reusing span objects or any other tricks. It might be good to have a section on performance optimizations that people could refer to.