Open danstarns opened 1 month ago
Hi thanks for the detailed report!
I've had a quick look at the benchmark code, and it looks like the SimpleSpanProcessor
is used there. It will export every ended span individually causing the exporter to send an HTTP request for each of these spans. The SimpleSpanProcessor
is best used for debugging and local development with the ConsoleSpanExporter
as it provides quick feedback.
Consider using the BatchSpanProcessor
for production use, this one will actually batch spans together and only pass them on to the SpanExporter
in a batched manner, which may improve performance by reducing the amount of export requests made significantly.
Thank you for your response @pichlermarc
I added batch
versions of the benchmarks:
The benchmarks indicate that using the BatchProcessor is significantly more performant than the SimpleProcessor. Thank you again for pointing this out. I have some questions:
Our benchmarks only produce a single span. Given there is nothing to batch, how does changing this processor affect the performance? Does this mean the SimpleProcessor is exporting the spans inline synchronously? Is this even possible?
Are the differences between the two processors documented anywhere?
Could the logic be wrapped in a setTimeout to prevent blocking the immediate loop?
Would it be wise to add a log in the console when users use SimpleProcessor?
The benchmarks indicate that using the BatchProcessor is significantly more performant than the SimpleProcessor. Thank you again for pointing this out. I have some questions:
Glad to hear that this improved things. :slightly_smiling_face: There's still some performance overhead, but we're hoping to gradually improve this in the future by optimizing the SDK and Exporters.
Our benchmarks only produce a single span. Given there is nothing to batch, how does changing this processor affect the performance?
It batches spans over multiple requests, once the batch is large enough it will then export the spans.
Does this mean the SimpleProcessor is exporting the spans inline synchronously? Is this even possible?
It does some work synchronously, namely serializing the OTLP message and starting the http request it seems to be enough to block the execution.
Are the differences between the two processors documented anywhere?
It is documented on opentelemetry.io, but I think it would also make sense to document this in the JSDoc for the processor.
Could the logic be wrapped in a setTimeout to prevent blocking the immediate loop?
setTimeout
or setImmediate
would be an option. However, AFAIK it was a conscious design choice that the SimpleSpanProcessor
does not defer the execution of the export to a later time.
Would it be wise to add a log in the console when users use SimpleProcessor?
I'd go with a JSDoc addition first, then re-evaluate if we need the log. :slightly_smiling_face:
Hi @pichlermarc.
There's still some performance overhead, but we're hoping to gradually improve this in the future by optimizing the SDK and Exporters.
I agree on the overhead. Even with using the Batch processor, in the screenshot, from our benchmarks, it shows that adding a few lines of otel js, using the batch processor, increases latency from 5.99ms
to 8.58ms
plus it impacts kps too.
Due to these findings, I made a PR to move the benchmarks I am using into this repo:
I have added these benchmarks to the source code of OpenTelemetry as they provide transparent insight into performance plus allow the community to iterate on performance improvements.
setTimeout or setImmediate would be an option. However, AFAIK it was a conscious design choice that the SimpleSpanProcessor does not defer the execution of the export to a later time.
Once we have some benchmarks in place we could experiment with things like this, as introducing a timeout and injecting in the current timestamp should not affect people's applications.
I'd go with a JSDoc addition first, then re-evaluate if we need the log.
Added here:
What happened?
I am writing this issue to express the performance issues I've encountered using your libraries within common Node.js APIs such as HTTP, Express, and GraphQL. I have some links, benchmarks and examples to share with you below.
Related:
Background
We are building an observability platform leveraging OTEL JS on the users’ client side. It was reported to us that using our libraries with their APIs resulted in massive latency issues. We did some investigation broken down in full detail here:
Our implementation essentially wraps a GraphQL resolver, picking up on values in the tree info argument, capturing attributes, and starting a span. As described in our adjacent issue, we took several necessary steps to ensure it wasn’t an issue with our implementation. Additionally, we created a few benchmarks with and without GraphQL usage to compare performance impacts.
Hypothesis
We see that the code added in the following diff is causing a huge increase in latency to this endpoint.
Alongside our benchmarks, as detailed below, we saw the following as confirmation of our hypothesis:
Benchmarks
When we ran with our hypothesis, we found a bunch of benchmarks like opentelemetry-java/benchmarks/ and https://github.com/open-telemetry/opentelemetry-js/issues/4171. We found these benchmarks to be not that helpful, and we were unable to compare the two languages or understand how OTEL JS affects performance inside an endpoint.
From this, we created a few types of benchmarks to test our findings and improve our performance.
Isolated OTEL JS Benchmarks
These benchmarks demonstrate using OTEL inside a basic endpoint. Our results show that adding just a few lines of OTEL code to your handler will result in a significant reduction in the performance of your API. For example, a basic http endpoint operating at
6.26ms
latency more than triples the average time to22.03ms
when OTEL is added, rendering it unusable for any production setting.GraphQL Debugger Benchmarks
GraphQL Specific fork of isolated benchmarks: