open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.94k stars 845 forks source link

Otel Java Auto Instrumentation is setting incorrect container.id resource attribute when application is deployed in AWS ECS Fargate Cluster. #7775

Open biswajit-nanda opened 1 year ago

biswajit-nanda commented 1 year ago

Describe the bug Recently ContainerID implementation of containers running in ECS Fargate Cluster has changed. Right now, the ECS Fargate Container uses the containerID format as "32bitAlphanumeric-10bitNumeric" format.

i.e. 8e7e67f77c0849bf80421aebbfbfb045-892424363

Please see the example of one such java application container in ECS Console below: image

When this application is auto-instrumented with Otel Java Agent, the value of the container.id resource attribute is being set as only the last 10bitNumeric string (i.e. 892424363)

Please see the screenshot from Jaeger below showing the same. image

This is creating significant problem when later on when we are trying to correlate the application with the actual container instance.

Steps to reproduce a. Create an ECS Fargate Cluster in AWS. b. Deploy a basic java web application in the ECS Fargate Cluster and instrument with OpenTelemetry JavaAgent. c. From AWS ECS Console, verify the container runtime id for the task deployed. c. Apply load on the application and either export the Spans to an Otel Collector or Jaeger. d. Look at the traces generated in the Jaeger and find the value of the conntainer.id resource attribute and compare the value with the container runtime id value from AWS ECS Console. You'll see the difference mentioned above.

If you want to reuse my application, I am attaching the jar file and the Dockerfile. In the Dockerfile, I have mentioned all the environment variable that I am using for Otel Instrumention (commented in the file).

What did you expect to see? The value of the container.id resource attribute set by Otel should match the value of the actual containerID that you see in the AWS ECS Task Console, i.e. 8e7e67f77c0849bf80421aebbfbfb045-892424363.

What did you see instead? The value of the container.id resource attribute set by Otel is the last 10 digits of the actual containerID that you see in the AWS ECS Task Console, i.e. 892424363.

What version are you using? Latest Version Of OpenTelemetry javaagent.jar (1.22.1)

Environment Compiler: Oracle JDK 1.8.0_281 OS: Mac OS Ventura 13.2 (22D49) Runtime (if different from JDK above): OpenJDK 1.8.0_332-b09 OS (if different from OS compiled on): AWS ECS Fargate Cluster (Linux 4.14.301-224.520.amzn2.x86_64)

TestCase.zip

laurit commented 1 year ago

If you wish to debug this then container.id is set by https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/ContainerResourceProvider.java and https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/ContainerResource.java which claim to handle cgroups v1 and v2. In contrib repo there is also an amazon specific resource provider https://github.com/open-telemetry/opentelemetry-java-contrib/tree/main/aws-resources which isn't currently included in the javaagent. Perhaps that one gives the result you expect? Aws distribution of otel agent https://github.com/aws-observability/aws-otel-java-instrumentation should include it.

biswajit-nanda commented 1 year ago

@laurit : I can confirm that the container.id resource attribute from ECS Fargate container is being set as expected (to actual ECS Fargate Container ID), if I use the AWS Otel Agent distribution from https://github.com/aws-observability/aws-otel-java-instrumentation.

Thanks for the pointers.

image

biswajit-nanda commented 1 year ago

I have fixed the issue by adding a new java class (EcsContainerIdExtractor.java) to the repo and modifying the https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/ContainerResource.java. Please see attached.

Archive.zip

mateuszrzeszutek commented 1 year ago

Hey @biswajit-nanda , Can you try downloading the opentelemetry-aws-resources module and using it as a javaagent extension? (-Dotel.javaagent.extensions=/path/to/opentelemetry-aws-resources.jar) As Lauri mentioned, the AWS resource provider from the contrib repo should provide the correct container id.

biswajit-nanda commented 1 year ago

Hey @mateuszrzeszutek,

I have already tried the following approaches and all of them work fine (Otel resource attribute correctly matches the actual containerID from ECS Fargate container): a. Using the javaaagent.jar from https://github.com/aws-observability/aws-otel-java-instrumentation. b. Using the javaagent.jar from https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar and using javaagent extension pointing to the opentelemetry-aws-resources.jar. c. Adding/Modifying new code (mentioned above) to https://github.com/open-telemetry/opentelemetry-java-instrumentation, building the source code and using the snapshot jar from the build.

So, I already have a solution in place that resolves my problem.

However, I believe, the code changes to https://github.com/open-telemetry/opentelemetry-java-instrumentation should be done to support this, as most of the times, users use K8s OpenTelemetry Operator (or some kind of other operators) to auto-inject the javaagent from https://github.com/open-telemetry/opentelemetry-java-instrumentation and in those cases, the option a and option b will not work.

Does that make sense?