opentracing / opentracing-java

OpenTracing API for Java. 🛑 This library is DEPRECATED! https://github.com/opentracing/specification/issues/163
http://opentracing.io
Apache License 2.0
1.68k stars 344 forks source link

Benchmark - overhead of instrumented code without a tracer #295

Open jpkrohling opened 6 years ago

jpkrohling commented 6 years ago

Create a performance test that assess the overhead of tracing with noop against a simple application. Scenarios to measure/test:

1) Simple single-threaded Java application with the NoopTracer, creating N spans 2) Simple Spring Boot application with a couple of endpoints and a couple of beans, each layer generating N spans. This test should handle several concurrent requests, to exercise a multi-threading scenario

Other ideas are welcome. The main goal is to assess that the hot spots from the module opentracing-api and opentracing-util are performing well or highlight the parts where performance could be improved.

For one or more scenarios, JMH might be useful: http://openjdk.java.net/projects/code-tools/jmh/

codefromthecrypt commented 5 years ago

This is more a task of instrumentation, and it is hard to understand what numbers are good considering the design of the tracer library impacts efficiency. One way is to compare against non-OT or non-OT-Bridged instrumentation.

For example, I'd expect the OT bridged instrumentation using Brave to be far less efficient vs the native brave instrumentation due to some design problems highlighted over the years. However, a lot of the sources of will be outside this repo, for example practice of walking stack traces routinely in instrumentation projects. So, there will be in one way the side effects of design here and also design in instrumentation.

Regardless, some work here will be helpful, just make sure when you test, you cover base case, unsampled/noop, nominal case and error case scenarios.

codefromthecrypt commented 5 years ago

and another note, even brave will be far less efficient than native agents like instana for example, which will very very unlikley use OT for things like servlet instrumentation. It would be good for the common public to know the comparison of how something works when there is no requirement to use OT to achieve the goal. For example, instana agent can trace servlet yet still supply a bridge to the OT layer for ad-hoc tracing. In this way only the OT parts will be hot spots. cc @codingfabian

codefromthecrypt commented 5 years ago

you could also do a similar comparison with other agents like elasticsearch's as at least that one supports a garbage-free (mostly) design and is OSS cc also @felixbarny

gsoria commented 5 years ago

I've been working in the benchmark tests related to this issue. The source code for the number 1 - single-threaded Java application is located here.

This is a result of an execution these tests.

The tests were executed in a personal notebook with these characteristics:

Model Name: MacBook Pro Processor Name: Intel Core i5 Processor Speed: 2.6 GHz Number of Processors: 1 Total Number of Cores: 2 L2 Cache (per Core): 256 KB L3 Cache: 3 MB Memory: 8 GB

I'll appreciate any feedback to improve this performance tests.

felixbarny commented 5 years ago

The string concatenations are likely to be eliminated by JIT's dead code analysis. I'd suggest returning the string from the benchmark method so that JMH can properly put them in a black hole. I'm also unsure why you want to benchmark string concatenations. IMO agents/tracers should avoid string concatenations and object allocations as much as possible. Or did you want to test the performance of creating spans? But then I don't quite understand why the benchmark is also about concatenating strings.

Just my 2c :)

gsoria commented 5 years ago

Hi @felixbarny Thank for your comments! :) I want to test the performance of creating spans, and I wanted to start with a simple example. I modified the benchmarks with your suggestion, and effectively the numbers make more sense now.

Please let me know if you have any better ideas to implement in these set of tests.

gsoria commented 5 years ago

I've been improving the benchmark tests:

1 - Single-threaded Java application Measuring the different ways to concatenate strings and comparing all of these ways but with the aggregation of creating spans with tracers NoopTracer, MockTracer and JaegerTracer. The new results are located here.

2 - Simple Spring Boot application Implementing a simple billing example, with services to create invoices, add line items, compute taxes, notify customers by email and issue invoices. The repository of invoices is kept in memory using a ConcurrentHashMap. This example also compares the same logic of these services but creating spans with tracers NoopTracer and JaegerTracer. This is a result of an execution these tests with 1 thread and this with 5 threads.

The tests were executed in a personal notebook with these characteristics:

Model Name: MacBook Pro Processor Name: Intel Core i5 Processor Speed: 2.6 GHz Number of Processors: 1 Total Number of Cores: 2 L2 Cache (per Core): 256 KB L3 Cache: 3 MB Memory: 8 GB

Any feedback is welcome! :)

objectiser commented 5 years ago

@gsoria Thanks for the details.

Personally I think the results from the spring boot app are more useful, as the overhead of tracing is being taken in the context of communications between services.

From the results 21-31-17, it looks like Jaeger is adding about 30% overhead - although it is interesting that using the NoopTracer is slightly better performance than the non-instrumented version.

What configuration was used for the JaegerTracer? Is it reporting spans via UDP to the agent, or Http to the collector directly?

gsoria commented 5 years ago

Hi @objectiser thanks for your review! :)

The configuration used for Jaeger was reporting spans via UDP as you can see in the code.

To be sure about the % of overhead I modified the Billing example, deleting the persistence of invoices in RAM and I re-run the tests.

This are the results with 1-thread and 5-threads. I think the results where NoopTracer has better performance than Non-instrumented it's because the numbers were jeopardized by GC.

objectiser commented 5 years ago

Hi @gsoria

Sorry hadn't looked at the actual code :) - noticed that currently the services are all beans so running in the same vm.

Do you also have plans to extend the benchmarks to test performance when those services are communicating via http (i.e. as REST services)? As I think this would be good from a comparison perspective, to see how much overhead OT adds for communicating services.