open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.88k stars 822 forks source link

Performance/Throughput Impact with auto instrumentation in Spring 5 applications #3047

Closed anuraaga closed 1 year ago

anuraaga commented 3 years ago

From https://github.com/aws-observability/aws-otel-java-instrumentation/issues/59

Describe the bug We are seeing more than 50% performance degradation with instrumenting otel agents, Our application instrumented with otel runs on EKS cluster. OTel Collector running as daemon set in the same EKS cluster collects traces and ingest data to AWS Xray.

Steps to reproduce This is Spring 5 project with webflux and spring cloud stream support interacting with SQS, DynamoDB and AWS MSK

What did you expect to see? Without Otel Agent, application could reach upto 250 request per second with 2Gi memory.

What did you see instead? After OTel agent, we are seeing ~65 request per second with same settings, I was expecting some degradation in the throughput but this is more 50%

Additional context We are using aws-opentelemetry-agent-1.1.0 with default settings for BSP and sampling is set to 100% and metrics exporter is set to logging.

anuraaga commented 3 years ago

@kurvatch Would you mind describing your code a bit more? I wonder if it's performance overhead from reactor instrumentation as you are using WebFlux, does each request in the "request per second" fan out to many reactor operations?

kurvatch commented 3 years ago

@anuraaga. An API request to this application generates audit event to Kafka topic using Spring Cloud Stream, query dynamodb via aws sdk, make an API call to another service via webclient, generate an event to SQS topic vis aws sdk and return response. This application is deployed in t3.large EKS cluster with 2Gi memory and 512m CPU. I am testing with sampling ratio at 1,2,5,10,25,50 and 100% to benchmark. Without Otel = ~250 RPS, with Otel = ~65 RPS for 100%

khmdev87 commented 3 years ago

@anuraaga Hello our project also has this issue we used springboot 2.3.4

without otel = ~200 TPS with otel = 100 TPS sampling rate is = 2%

anuraaga commented 3 years ago

@kurvatch @khmdev87 To help pinpoint whether it's an issue with any one instrumentation, would you be able to check performance with parts disabled

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/suppressing-instrumentation.md

Or if you're able to come up with a small repro that you can share with us, than that would work well too. We currently have checks running spring-petclinic and haven't seen drops like 100% so I'm guessing there's one problematic instrumentation but hard to know which without trying these. Sorry for having so many options to choose from, we instrument a lot of stuff...

anuraaga commented 3 years ago

Ah also, if you can disable exporters with OTEL_TRACES_EXPORTER=none, OTEL_METRICS_EXPORTER=none to make sure export isn't affecting the numbers too.

Setting sampling rate to 0% and seeing what happens would also be interesting.

khmdev87 commented 3 years ago

@anuraaga thank you for your comment

i can share more info

first for using otel agent in our case we defined Dockerfile

ADD https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar ${TERRA_HOME}/aws-opentelemetry-agent.jar ENV OTEL_RESOURCE_ATTRIBUTES=service.name=XXXX ENV OTEL_TRACES_SAMPLER=parentbased_traceidratio ENV OTEL_TRACES_SAMPLER_ARG=0.02

java -jar -javaagent:${TERRA_HOME}/aws-opentelemetry-agent.jar -Dotel.exporter.otlp.endpoint=http://127.0.0.1:XXXX -Dotel.instrumentation.common.default-enabled=true -Dotel.instrumentation.opentelemetry-annotations.enabled=true

we used ECS and we each task has otel sidecar so otel agent send the data to sidecar then we can see x-ray status

and our app used spring-boot-starter-webflux spring-boot-starter-data-r2dbc spring-boot-starter-data-redis-reactive

can u recommend options? i can test more and check Performance

anuraaga commented 3 years ago

At a glance, it sounds like frameworks that would be part of your app are

spring-webflux spring-data reactor reactor-netty lettuce netty

Actually you can enable instrumentation one at a time using -Dotel.instrumentation.common.default-enabled=false followed by -Dotel.instrumentation.[name].enabled=true. So if you have some time (I know it's tedious...), maybe you could try adding that flag and if the performance looks ok, then add one instrumentation at a time to find any problematic one?

khmdev87 commented 3 years ago

@anuraaga thank you for your comment i will test one by one

kkchoudhary5895 commented 3 years ago

hi, I am using ECS with otel sidecar but it is impacting our performance by 40-50% down. Then i tried to set instrumentation as per your docs but still perf issue. Please find below that i tested 1) -Dotel.instrumentation.common.default-enabled=false -> no traces 2)-Dotel.instrumentation.common.default-enabled=true -> perf down by 50% and got traces 3) -Dotel.instrumentation.common.default-enabled=true, netty, jdis, tomact, jdbc,r2db,spring-data, jetty,log4j,lettuce,spring-core ...any many more set to true but got only redis traces....

my docker file changes: ADD https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar ${TERRA_HOME}/aws-opentelemetry-agent.jar ENV OTEL_RESOURCE_ATTRIBUTES=service.name=abc ENV OTEL_TRACES_SAMPLER=parentbased_traceidratio ENV OTEL_TRACES_SAMPLER_ARG=0.02 -Dotel.instrumentation.common.default-enabled=true -Dotel.instrumentation.methods.enabled=false -Dotel.instrumentation.external-annotations.enabled=false -Dotel.instrumentation.akka-actor.enabled=false -Dotel.instrumentation.akka-http.enabled=false -Dotel.instrumentation.axis2.enabled=false -Dotel.instrumentation.apache-camel.enabled=false -Dotel.instrumentation.cassandra.enabled=false -Dotel.instrumentation.cxf.enabled=false -Dotel.instrumentation.apache-dubbo.enabled=false -Dotel.instrumentation.apache-httpasyncclient.enabled=false -Dotel.instrumentation.apache-httpclient.enabled=false -Dotel.instrumentation.kafka.enabled=false -Dotel.instrumentation.rocketmq-client.enabled=false -Dotel.instrumentation.tapestry.enabled=false -Dotel.instrumentation.tomcat.enabled=true -Dotel.instrumentation.wicket.enabled=false -Dotel.instrumentation.armeria.enabled=false -Dotel.instrumentation.async-http-client.enabled=false -Dotel.instrumentation.aws-lambda.enabled=false -Dotel.instrumentation.aws-sdk.enabled=false -Dotel.instrumentation.couchbase.enabled=false -Dotel.instrumentation.dropwizard-views.enabled=false -Dotel.instrumentation.eclipse-osgi.enabled=false -Dotel.instrumentation.elasticsearch-rest.enabled=false -Dotel.instrumentation.guava.enabled=false -Dotel.instrumentation.google-http-client.enabled=false -Dotel.instrumentation.gwt.enabled=false -Dotel.instrumentation.grails.enabled=false -Dotel.instrumentation.grpc.enabled=false -Dotel.instrumentation.hibernate.enabled=false -Dotel.instrumentation.grizzly.enabled=false -Dotel.instrumentation.java-http-client.enabled=false -Dotel.instrumentation.http-url-connection.enabled=false -Dotel.instrumentation.jdbc.enabled=true -Dotel.instrumentation.r2db.enabled=true -Dotel.instrumentation.jdbc-datasource.enabled=true -Dotel.instrumentation.rmi.enabled=false -Dotel.instrumentation.servlet.enabled=true -Dotel.instrumentation.executor.enabled=false -Dotel.instrumentation.jaxrs-client.enabled=false -Dotel.instrumentation.jaxws.enabled=false -Dotel.instrumentation.metro.enabled=false -Dotel.instrumentation.jetty.enabled=true -Dotel.instrumentation.jms.enabled=false -Dotel.instrumentation.mojarra.enabled=false -Dotel.instrumentation.myfaces.enabled=false -Dotel.instrumentation.jsp.enabled=false -Dotel.instrumentation.kubernetes-client.enabled=false -Dotel.instrumentation.khttp.enabled=false -Dotel.instrumentation.kotlinx-coroutines.enabled=false -Dotel.instrumentation.log4j.enabled=true -Dotel.instrumentation.logback.enabled=true -Dotel.instrumentation.mongo.enabled=false -Dotel.instrumentation.hystrix.enabled=false -Dotel.instrumentation.netty.enabled=true -Dotel.instrumentation.okhttp.enabled=false -Dotel.instrumentation.liberty.enabled=false -Dotel.instrumentation.opentelemetry-annotations.enabled=true -Dotel.instrumentation.oshi.enabled=false -Dotel.instrumentation.play.enabled=false -Dotel.instrumentation.play-ws.enabled=false -Dotel.instrumentation.rabbitmq.enabled=false -Dotel.instrumentation.ratpack.enabled=false -Dotel.instrumentation.rxjava2.enabled=false -Dotel.instrumentation.rxjava3.enabled=false -Dotel.instrumentation.reactor.enabled=false -Dotel.instrumentation.reactor-netty.enabled=false -Dotel.instrumentation.jedis.enabled=true -Dotel.instrumentation.lettuce.enabled=true -Dotel.instrumentation.rediscala.enabled=false -Dotel.instrumentation.scala-executors.enabled=false -Dotel.instrumentation.spark.enabled=false -Dotel.instrumentation.spring-core.enabled=true -Dotel.instrumentation.spring-data.enabled=true -Dotel.instrumentation.spring-scheduling.enabled=false -Dotel.instrumentation.spring-webflux.enabled=true -Dotel.instrumentation.spring-webmvc.enabled=false -Dotel.instrumentation.spring-ws.enabled=false -Dotel.instrumentation.finatra.enabled=false -Dotel.instrumentation.spymemcached.enabled=false -Dotel.instrumentation.struts.enabled=false -Dotel.instrumentation.twilio.enabled=false -Dotel.instrumentation.undertow.enabled=false -Dotel.instrumentation.vaadin.enabled=false -Dotel.instrumentation.vertx.enabled=false -Dotel.instrumentation.opentelemetry-annotations.enabled=false

so, what i want to achieve it that...no major perf issue and get all my application traces like ... redis,rds, i/o...

how can i do this can you guide me please?
very urgent !!