Closed anuraaga closed 1 year ago
@kurvatch Would you mind describing your code a bit more? I wonder if it's performance overhead from reactor instrumentation as you are using WebFlux, does each request in the "request per second" fan out to many reactor operations?
@anuraaga. An API request to this application generates audit event to Kafka topic using Spring Cloud Stream, query dynamodb via aws sdk, make an API call to another service via webclient, generate an event to SQS topic vis aws sdk and return response. This application is deployed in t3.large EKS cluster with 2Gi memory and 512m CPU. I am testing with sampling ratio at 1,2,5,10,25,50 and 100% to benchmark. Without Otel = ~250 RPS, with Otel = ~65 RPS for 100%
@anuraaga Hello our project also has this issue we used springboot 2.3.4
without otel = ~200 TPS with otel = 100 TPS sampling rate is = 2%
@kurvatch @khmdev87 To help pinpoint whether it's an issue with any one instrumentation, would you be able to check performance with parts disabled
-Dotel.javaagent.enabled=false
-Dotel.instrumentation.spring-webflux.enabled=false
-Dotel.instrumentation.executor.enabled=false
-Dotel.instrumentation.reactor.enabled=false
/ -Dotel.instrumentation.spring-reactor-netty.enabled=false
-Dotel.instrumentation.netty.enabled=false
Or if you're able to come up with a small repro that you can share with us, than that would work well too. We currently have checks running spring-petclinic
and haven't seen drops like 100% so I'm guessing there's one problematic instrumentation but hard to know which without trying these. Sorry for having so many options to choose from, we instrument a lot of stuff...
Ah also, if you can disable exporters with OTEL_TRACES_EXPORTER=none
, OTEL_METRICS_EXPORTER=none
to make sure export isn't affecting the numbers too.
Setting sampling rate to 0% and seeing what happens would also be interesting.
@anuraaga thank you for your comment
i can share more info
first for using otel agent in our case we defined Dockerfile
ADD https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar ${TERRA_HOME}/aws-opentelemetry-agent.jar ENV OTEL_RESOURCE_ATTRIBUTES=service.name=XXXX ENV OTEL_TRACES_SAMPLER=parentbased_traceidratio ENV OTEL_TRACES_SAMPLER_ARG=0.02
java -jar -javaagent:${TERRA_HOME}/aws-opentelemetry-agent.jar -Dotel.exporter.otlp.endpoint=http://127.0.0.1:XXXX -Dotel.instrumentation.common.default-enabled=true -Dotel.instrumentation.opentelemetry-annotations.enabled=true
we used ECS and we each task has otel sidecar so otel agent send the data to sidecar then we can see x-ray status
and our app used spring-boot-starter-webflux spring-boot-starter-data-r2dbc spring-boot-starter-data-redis-reactive
can u recommend options? i can test more and check Performance
At a glance, it sounds like frameworks that would be part of your app are
spring-webflux
spring-data
reactor
reactor-netty
lettuce
netty
Actually you can enable instrumentation one at a time using -Dotel.instrumentation.common.default-enabled=false
followed by -Dotel.instrumentation.[name].enabled=true
. So if you have some time (I know it's tedious...), maybe you could try adding that flag and if the performance looks ok, then add one instrumentation at a time to find any problematic one?
@anuraaga thank you for your comment i will test one by one
hi, I am using ECS with otel sidecar but it is impacting our performance by 40-50% down. Then i tried to set instrumentation as per your docs but still perf issue. Please find below that i tested 1) -Dotel.instrumentation.common.default-enabled=false -> no traces 2)-Dotel.instrumentation.common.default-enabled=true -> perf down by 50% and got traces 3) -Dotel.instrumentation.common.default-enabled=true, netty, jdis, tomact, jdbc,r2db,spring-data, jetty,log4j,lettuce,spring-core ...any many more set to true but got only redis traces....
my docker file changes: ADD https://github.com/aws-observability/aws-otel-java-instrumentation/releases/latest/download/aws-opentelemetry-agent.jar ${TERRA_HOME}/aws-opentelemetry-agent.jar ENV OTEL_RESOURCE_ATTRIBUTES=service.name=abc ENV OTEL_TRACES_SAMPLER=parentbased_traceidratio ENV OTEL_TRACES_SAMPLER_ARG=0.02 -Dotel.instrumentation.common.default-enabled=true -Dotel.instrumentation.methods.enabled=false -Dotel.instrumentation.external-annotations.enabled=false -Dotel.instrumentation.akka-actor.enabled=false -Dotel.instrumentation.akka-http.enabled=false -Dotel.instrumentation.axis2.enabled=false -Dotel.instrumentation.apache-camel.enabled=false -Dotel.instrumentation.cassandra.enabled=false -Dotel.instrumentation.cxf.enabled=false -Dotel.instrumentation.apache-dubbo.enabled=false -Dotel.instrumentation.apache-httpasyncclient.enabled=false -Dotel.instrumentation.apache-httpclient.enabled=false -Dotel.instrumentation.kafka.enabled=false -Dotel.instrumentation.rocketmq-client.enabled=false -Dotel.instrumentation.tapestry.enabled=false -Dotel.instrumentation.tomcat.enabled=true -Dotel.instrumentation.wicket.enabled=false -Dotel.instrumentation.armeria.enabled=false -Dotel.instrumentation.async-http-client.enabled=false -Dotel.instrumentation.aws-lambda.enabled=false -Dotel.instrumentation.aws-sdk.enabled=false -Dotel.instrumentation.couchbase.enabled=false -Dotel.instrumentation.dropwizard-views.enabled=false -Dotel.instrumentation.eclipse-osgi.enabled=false -Dotel.instrumentation.elasticsearch-rest.enabled=false -Dotel.instrumentation.guava.enabled=false -Dotel.instrumentation.google-http-client.enabled=false -Dotel.instrumentation.gwt.enabled=false -Dotel.instrumentation.grails.enabled=false -Dotel.instrumentation.grpc.enabled=false -Dotel.instrumentation.hibernate.enabled=false -Dotel.instrumentation.grizzly.enabled=false -Dotel.instrumentation.java-http-client.enabled=false -Dotel.instrumentation.http-url-connection.enabled=false -Dotel.instrumentation.jdbc.enabled=true -Dotel.instrumentation.r2db.enabled=true -Dotel.instrumentation.jdbc-datasource.enabled=true -Dotel.instrumentation.rmi.enabled=false -Dotel.instrumentation.servlet.enabled=true -Dotel.instrumentation.executor.enabled=false -Dotel.instrumentation.jaxrs-client.enabled=false -Dotel.instrumentation.jaxws.enabled=false -Dotel.instrumentation.metro.enabled=false -Dotel.instrumentation.jetty.enabled=true -Dotel.instrumentation.jms.enabled=false -Dotel.instrumentation.mojarra.enabled=false -Dotel.instrumentation.myfaces.enabled=false -Dotel.instrumentation.jsp.enabled=false -Dotel.instrumentation.kubernetes-client.enabled=false -Dotel.instrumentation.khttp.enabled=false -Dotel.instrumentation.kotlinx-coroutines.enabled=false -Dotel.instrumentation.log4j.enabled=true -Dotel.instrumentation.logback.enabled=true -Dotel.instrumentation.mongo.enabled=false -Dotel.instrumentation.hystrix.enabled=false -Dotel.instrumentation.netty.enabled=true -Dotel.instrumentation.okhttp.enabled=false -Dotel.instrumentation.liberty.enabled=false -Dotel.instrumentation.opentelemetry-annotations.enabled=true -Dotel.instrumentation.oshi.enabled=false -Dotel.instrumentation.play.enabled=false -Dotel.instrumentation.play-ws.enabled=false -Dotel.instrumentation.rabbitmq.enabled=false -Dotel.instrumentation.ratpack.enabled=false -Dotel.instrumentation.rxjava2.enabled=false -Dotel.instrumentation.rxjava3.enabled=false -Dotel.instrumentation.reactor.enabled=false -Dotel.instrumentation.reactor-netty.enabled=false -Dotel.instrumentation.jedis.enabled=true -Dotel.instrumentation.lettuce.enabled=true -Dotel.instrumentation.rediscala.enabled=false -Dotel.instrumentation.scala-executors.enabled=false -Dotel.instrumentation.spark.enabled=false -Dotel.instrumentation.spring-core.enabled=true -Dotel.instrumentation.spring-data.enabled=true -Dotel.instrumentation.spring-scheduling.enabled=false -Dotel.instrumentation.spring-webflux.enabled=true -Dotel.instrumentation.spring-webmvc.enabled=false -Dotel.instrumentation.spring-ws.enabled=false -Dotel.instrumentation.finatra.enabled=false -Dotel.instrumentation.spymemcached.enabled=false -Dotel.instrumentation.struts.enabled=false -Dotel.instrumentation.twilio.enabled=false -Dotel.instrumentation.undertow.enabled=false -Dotel.instrumentation.vaadin.enabled=false -Dotel.instrumentation.vertx.enabled=false -Dotel.instrumentation.opentelemetry-annotations.enabled=false
so, what i want to achieve it that...no major perf issue and get all my application traces like ... redis,rds, i/o...
how can i do this can you guide me please?
very urgent !!
From https://github.com/aws-observability/aws-otel-java-instrumentation/issues/59
Describe the bug We are seeing more than 50% performance degradation with instrumenting otel agents, Our application instrumented with otel runs on EKS cluster. OTel Collector running as daemon set in the same EKS cluster collects traces and ingest data to AWS Xray.
Steps to reproduce This is Spring 5 project with webflux and spring cloud stream support interacting with SQS, DynamoDB and AWS MSK
What did you expect to see? Without Otel Agent, application could reach upto 250 request per second with 2Gi memory.
What did you see instead? After OTel agent, we are seeing ~65 request per second with same settings, I was expecting some degradation in the throughput but this is more 50%
Additional context We are using aws-opentelemetry-agent-1.1.0 with default settings for BSP and sampling is set to 100% and metrics exporter is set to logging.