open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.77k stars 2.19k forks source link

In AWS EKS optimized AL2023, it can't pull EC2 metadata in the cloudwatch-agent pod. #31843

Closed kyle504 closed 6 hours ago

kyle504 commented 4 months ago

Component(s)

internal/aws

What happened?

Description

In AL2023, EKS optimized AMI released with EC2 default hop-count as 1. You can check the update note in document note section.

But cloudwatch-agent pod uses opentelemetry to get monitoring data. So cloudwatch-agent pod -> EC2 Node -> AWS Server, it needs hop-count 2. Opentelemetry can't reach to the AWS server.

Steps to Reproduce

  1. Create EKS cluster
  2. Create managed node group with AL2023
  3. Install cloudwatch observability add-on

Expected Result

Performance logs should be collected well to cloudwatch.

Actual Result

Can't collect performance data.

Collector version

v0.89.0

Environment information

Environment

OS: AL2023, cloudwatch-agent:1.300034.1b536 Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

2024-03-19T08:06:00.955410089Z stdout F I! imds retry client will retry 1 timesD! should retry true for imds error : RequestError: send request failed
2024-03-19T08:06:01.990573705Z stdout F caused by: Put "http://169.254.169.254/latest/api/token": context deadline exceeded (Client.Timeout exceeded while awaiting headers)D! should retry true for imds error : RequestError: send request failed
2024-03-19T08:06:01.990682581Z stdout F caused by: Put "http://169.254.169.254/latest/api/token": context deadline exceeded (Client.Timeout exceeded while awaiting headers)D! could not get hostname without imds v1 fallback enable thus enable fallback
2024-03-19T08:06:05.132329297Z stdout F E! [EC2] Fetch hostname from EC2 metadata fail: EC2MetadataError: failed to make EC2Metadata request
2024-03-19T08:06:05.132353411Z stdout F
2024-03-19T08:06:05.132357507Z stdout F         status code: 401, request id:
2024-03-19T08:06:06.132963089Z stdout F D! should retry true for imds error : RequestError: send request failed
2024-03-19T08:06:07.181617984Z stdout F caused by: Put "http://169.254.169.254/latest/api/token": context deadline exceeded (Client.Timeout exceeded while awaiting headers)D! should retry true for imds error : RequestError: send request failed
2024-03-19T08:06:07.181653952Z stdout F caused by: Put "http://169.254.169.254/latest/api/token": context deadline exceeded (Client.Timeout exceeded while awaiting headers)D! could not get instance document without imds v1 fallback enable thus enable fallback
2024-03-19T08:06:10.358260881Z stdout F E! [EC2] Fetch identity document from EC2 metadata fail: EC2MetadataRequestError: failed to get EC2 instance identity document
2024-03-19T08:06:10.358298644Z stdout F caused by: EC2MetadataError: failed to make EC2Metadata request
2024-03-19T08:06:10.358303547Z stdout F
2024-03-19T08:06:10.358307135Z stdout F         status code: 401, request id:

Additional context

No response

github-actions[bot] commented 4 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

kyle504 commented 4 months ago

I found it can be fixed by changing EC2 Node hop-count from 1 to 2. But we should use LaunchTemplate or any other method to fix it. Is there any plan to change it?

github-actions[bot] commented 2 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 6 hours ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.