opentracing-contrib / java-specialagent

Automatic instrumentation for 3rd-party libraries in Java applications with OpenTracing.
Apache License 2.0
185 stars 46 forks source link

improve startup time #552

Open randallt opened 4 years ago

randallt commented 4 years ago

Several of our users have complained about the startup time of their services being multiplied by several factors when using the special agent. We even already use a trimmed-down version with most of the rule plugins (integrations) removed (sync'ed to v1.7.0 of the specialagent code) with an integrated Wavefront Tracer (using their SDK not their bundle; other tracers are removed from our build of specialagent). Even so, services sometimes see a 10x or greater increase to their start time (e.g. from 3 sec to 30).

On a dummy app that simply exits immediately, execution time without the agent was less than 100 ms. With the agent it is over 10 sec. Profiling/tracing the startup of the agent can be tricky (e.g. YourKit can't do this), so I simply took the poor-man's approach and immediately started a new thread that sleeps 13 ms and then prints a thread dump (infinite loop). This was run on a 4-core VM, and top showed basically 100% (1 full core) utilized for the entire 10 sec. I created a created a flame graph of the resulting thread dumps using https://github.com/brendangregg/FlameGraph.

The result gives a decent view of where the time is spent.

startup_13ms_samples_main_only.zip

(You'll need to download the file, unzip it, and view it in a browser since github doesn't support uploading SVG files directly). But here is a screenshot as well (after clicking into the main thread):

screenshot_startup_13ms_samples

This shows that the majority of time is spent in bytebuddy code, which makes sense. So anything we can do to optimize byte buddy time will be good. But we can see that the majority of time is divided into a few main pieces:

BootLoaderAgent.premain (about 25%) AgentRule$$Access.init (about 40%) SpecialAgent.loadAdapter (about 20%, mostly loading the Wavefront SDK stuff)

Is there any potential to utilize multiple threads across any of the 3 above?

I am a ByteBuddy novice for sure, but I will do what I can to help with this effort.

It is likely that using a better profiler would help with the data fidelity as well.

safris commented 4 years ago

Hi @randallt, thank you for this analysis! We have not applied a dedicated focus on profiling SpecialAgent, but its startup time does need some attention. The crux of performance hit is going to be the re/transformation operations that ByteBuddy -> ASM -> java.lang.instrument.Instrumentation has to perform to instrument the bytecode. The time that SpecialAgent needs to take to instrument the bytecode is linear with respect to the size of the bytecode (in bytes). If application (a) is 1.5x the size of application (b), then application (a) will take 1.5x the time to instrument compared to (b). There is no way around this, because ByteBuddy -> ASM -> java.lang.instrument.Instrumentation has to scan the entirety of the VM to potentially instrument matching sections.

One way to reduce the startup time for your own application is to trim down the codebase being packaged into the application. Spring and Spring Boot are notorious for importing everything (and the kitchen sink) into an application, regardless of whether it's just to print "Hello World".

There are a few things that could be done to improve performance:

  1. Rule Chaining Another important aspect that results in longer startup times is the way rules are defined. Prior to SpecialAgent v1.7.0, all rules were independently applied to the application. This resulted in a re/transformation round for each rule. The work I did for SpecialAgent v1.7.0 was to chain as many rules into a single re/transformation round as possible. There were a few rules, however, that could not be chained. Reason being that these rules defined intercepts for the same class, but applied to different versions of the same library (Lettuce 5.0, 5.1, 5.2, and Spring WebMVC 3, 4, 5). Thanks to @malafeev's work, he was able to merge all Lettuce rules into in one (#544), and just recently all Spring WebMVC rules into one (#551 #554). The result of this work will be to be able to add the Lettuce and Spring WebMVC rules into the same re/transformation chain.

  2. BootLoaderAgent and ClassLoaderAgent Another important aspect that results in longer startup times is the BootLoaderAgent. This agent is a core part of SpecialAgent's architecture, as it allows for plugin code to be injected into the bootstrap class loader for any version of the Java MV (1.7, 1.8, 9+). The BootLoaderAgent intercepts very many calls, so a slight improvement in its runtime would result in a significant benefit.

Is there any potential to utilize multiple threads across any of the 3 above?

Unfortunately, delaying the initialization by utilizing threads would simply result in the application starting sooner, but the codebase being partially instrumented.

It is likely that using a better profiler would help with the data fidelity as well.

Yes, you're right! Thank you for the SVG -- it provides a good high-level breakdown of bottlenecks in SpecialAgent. I'll use it to guide my focus.

safris commented 4 years ago

Hi @randallt, there have been a few updates to SpecialAgent concerning this issue, which include:

  1. Chain rules in BootLoaderAgent
  2. Removal of last "unchained" agent rule: Spring Web
  3. Merge unchained spring-webmvc-* rules into a single chained rule
  4. Merge unchained lettuce-* rules into a single chained rule

These changes address the relatively "low hanging fruit" items to improve startup time. There are a number of more involved items that would also help, which are:

  1. Optimize bytecode injection mechanism in ClassLoaderAgent.
  2. Optimize each individual ByteBuddy transformation in each AgentRule subclass.

We will release SpecialAgent v1.7.2 this weekend, which will include the former updates. If you have any feedback regarding your own observations, this would be welcome!

randallt commented 4 years ago

Hi @safris, only (1) above applies to my build, as I don't include the spring or lettuce rules at all. That said, this change has had a great effect. For a simple service with the tracing agent, startup time went from 20s to 13s seconds (a 35% reduction), with the premain execution time going from 12s down to 7s (42% reduction).

Great work, and I hope there's more where that came from!

randallt commented 4 years ago

On a larger application, this change dropped the startup time about 20 sec, or 29%, from about 70 sec down to 50 sec. I haven't seen any change in function.

andrewhsu commented 4 years ago

I see this issue is currently in queue for milestone v1.7.4. Curious to know if a fix is still forthcoming?

Seeing long startup times (approx 5 minutes) with tomcat:7.0.103-jdk8 environments.

Some more context on the startup environment:

export LS_AGENT_FILE="${LS_AGENT_FILE:-opentracing-specialagent-1.7.3.jar}"
export LIGHTSTEP_VARS="-Dsa.exporter=lightstep -Dls.collectorHost=$LS_HOST -Dls.collectorProtocol=$LS_PROTOCOL -Dls.collectorPort=$LS_PORT -Dls.componentName=$LS_SERVICE -Dls.tags='environment:${LS_ENVIRONMENT},version:${LS_SERVICE_VERSION},hostname:${LS_HOSTNAME}' -Dls.accessToken=$LS_TOKEN -Dls.propagator=b3"
export CATALINA_OPTS="$CATALINA_OPTS $LIGHTSTEP_VARS -javaagent:$(pwd)/${LS_AGENT_FILE}"