open-telemetry / opentelemetry-js-contrib

OpenTelemetry instrumentation for JavaScript modules
https://opentelemetry.io
Apache License 2.0
668 stars 491 forks source link

"container.id" resource attribute is wrong for ECS Fargate #2032

Open iancward opened 5 months ago

iancward commented 5 months ago

What version of OpenTelemetry are you using?

      "dependencies": {
        "@opentelemetry/api": "^1.7.0",
        "@opentelemetry/instrumentation-http": "^0.48.0",
        "@opentelemetry/instrumentation-ioredis": "^0.37.0",
        "@opentelemetry/instrumentation-mongoose": "^0.35.0",
        "@opentelemetry/instrumentation-nestjs-core": "^0.34.0",
        "@opentelemetry/resource-detector-aws": "^1.3.6",
        "@opentelemetry/resources": "^1.21.0",
        "@opentelemetry/sdk-metrics": "^1.21.0",
        "@opentelemetry/sdk-node": "^0.48.0",
        "@opentelemetry/semantic-conventions": "^1.21.0",
        ...
  }

What version of Node are you using?

NodeJS 18

What did you do?

We are using Resource Detector AWS similar to the documentation and found the container.id resource attribute is wrong in ECS Fargate.

The resource detector ooks at the cgroup file and pulls the last 64 characters.

That works fine for ECS on EC2. Example:

11:memory:/ecs/35df273518d5404f87cb4f2b52200ab1/f177fa403396aaac636067f4b4d4fd76c17b44b6d19c0dd0882a0a7bb2215ef2
10:freezer:/ecs/35df273518d5404f87cb4f2b52200ab1/f177fa403396aaac636067f4b4d4fd76c17b44b6d19c0dd0882a0a7bb2215ef2
9:cpu,cpuacct:/ecs/35df273518d5404f87cb4f2b52200ab1/f177fa403396aaac636067f4b4d4fd76c17b44b6d19c0dd0882a0a7bb2215ef2
8:pids:/ecs/35df273518d5404f87cb4f2b52200ab1/f177fa403396aaac636067f4b4d4fd76c17b44b6d19c0dd0882a0a7bb2215ef2
7:perf_event:/ecs/35df273518d5404f87cb4f2b52200ab1/f177fa403396aaac636067f4b4d4fd76c17b44b6d19c0dd0882a0a7bb2215ef2
6:net_cls,net_prio:/ecs/35df273518d5404f87cb4f2b52200ab1/f177fa403396aaac636067f4b4d4fd76c17b44b6d19c0dd0882a0a7bb2215ef2
5:cpuset:/ecs/35df273518d5404f87cb4f2b52200ab1/f177fa403396aaac636067f4b4d4fd76c17b44b6d19c0dd0882a0a7bb2215ef2
4:blkio:/ecs/35df273518d5404f87cb4f2b52200ab1/f177fa403396aaac636067f4b4d4fd76c17b44b6d19c0dd0882a0a7bb2215ef2
3:devices:/ecs/35df273518d5404f87cb4f2b52200ab1/f177fa403396aaac636067f4b4d4fd76c17b44b6d19c0dd0882a0a7bb2215ef2
2:hugetlb:/ecs/35df273518d5404f87cb4f2b52200ab1/f177fa403396aaac636067f4b4d4fd76c17b44b6d19c0dd0882a0a7bb2215ef2
1:name=systemd:/ecs/35df273518d5404f87cb4f2b52200ab1/f177fa403396aaac636067f4b4d4fd76c17b44b6d19c0dd0882a0a7bb2215ef2

But generates an incorrect value in Fargate, because the schema is different:

11:freezer:/ecs/a3490942f53a40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379
10:cpuset:/ecs/a3490942f53a40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379
9:hugetlb:/ecs/a3490942f53a40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379
8:perf_event:/ecs/a3490942f53a40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379
7:pids:/ecs/a3490942f53a40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379
6:devices:/ecs/a3490942f53a40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379
5:blkio:/ecs/a3490942f53a40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379
4:memory:/ecs/a3490942f53a40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379
3:net_cls,net_prio:/ecs/a3490942f53a40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379
2:cpu,cpuacct:/ecs/a3490942f53a40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379
1:name=systemd:/ecs/a3490942f53a40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379

What did you expect to see?

In Fargate, I expect to see the correct Container ID, which is a3490942f53a40d0a39674bdb52d443e-3300241379. According to the ECS metadata information, a Fargate container ID "a 32-digit hex followed by a 10 digit number."

What did you see instead?

I saw 40d0a39674bdb52d443e/a3490942f53a40d0a39674bdb52d443e-3300241379, which is the container ID prepended with other, incorrect, data.

Additional context

~While it would be nice to pull the container ID from the ECS metadata endpoint, I'm not sure if there's a way to deterministically get the ID for the local container. The logic to get the container ID from the cgroup file should probably get updated to split on / and grab the last value. It looks like that will work for both ECS on EC2 and ECS on Fargate.~

EDIT:

The Resource detector is already pulling the containerMetadata (info about the local container) and taskMetdata (info about the task). The DockerID is available from the containerMetadata for both ECS on EC2 and ECS on Fargate, so I believe the best solution would be to update the detector to pull the ID from there.

trentm commented 5 months ago

On updating the labels: I think it is @opentelemetry/resource-detector-aws rather than @opentelemetry/resource-detector-container here.