open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.99k stars 868 forks source link

Improve startup overhead #9

Closed trask closed 4 years ago

trask commented 5 years ago

Current startup overhead on Azure App Service P1V2 instances (single core) is ~40 seconds.

Initial goal is to get this under 10 seconds in this environment.

And then we can open new issues for subsequent goals. We (Azure) eventually need to get this way under 10 seconds.

tedsuo commented 5 years ago

@trask curious, what application/dependencies are you loading when you test this? Would be good to make it a repeatable test.

trask commented 5 years ago

Oh yes, I should have included that! I'm using Spring PetClinic for testing, and the benchmarking harness/scripts for this test are at https://github.com/trask/agent-benchmarking/tree/master/coldstart.

tedsuo commented 5 years ago

oh awesome, thanks!

prydin commented 4 years ago

I just tried PetClinic with all instrumentation enabled and a real exporter and got a 13s startup time on my Mac which is also running an IDE and another application under test.

Is this still an issue or should it be closed?

jkwatson commented 4 years ago

@trask should retest and see how it works in his "standard" environment.

trask commented 4 years ago

The startup is much slower on single core cloud machines. But this may just be a concern for Azure (and other cloud providers), where cold start time is a super important metric, and adding even 10 seconds to cold start overhead is frowned upon.

prydin commented 4 years ago

OK. For reference, it took 5.5s without instrumentation, so there's definitely a measurable added delay.

safris commented 4 years ago

In my previous observations, I found that instrumentation of Spring applications is particularly time consuming. For SpecialAgent, we approached resolving this with Static Deferred Attach. If I remember correctly, with the initial use-case that engaged us to develop this solution, we saw a reduction of the startup time of the respective Spring Boot application from 40s to 5s.

trask commented 4 years ago

I wanted to report that there have been major improvements to startup overhead thanks to DataDog efforts in this area. I'll re-run and post new startup benchmarks soon.

trask commented 4 years ago

Latest startup overhead in the single core cloud machine test was 13.5 seconds. That's a very good improvement, and I think justifies closing this initial tracking issue. I'll open another issue at some point to track further progress.

gangxie112 commented 1 year ago

Do we have more improments recently?