Open elburnetto-intapp opened 1 month ago
Pinging code owners:
receiver/awscloudwatch: @schmikei
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Hmm it was my understanding that we rediscover on each poll interval so I imagined that it would fit your use case...
Its not panicking from that error based on the code but behaving correctly for that request. Any other groups should still be getting collected just by looking at the code.
The only reason I could think is that the AWS CloudWatch Logs API is still returning on subsequent poll intervals. Would you be up for enabling debug logs by adding this service snippet to your config?
service:
telemetry:
logs:
level: debug
I would like to see if it's still getting rediscovered after deletion and after 2 polls.
Expecting a log message to be outputted in the debug level that has the message "discovered log group" with the deleted log group. We may need to add special handling for that error, but I'd rather avoid that if possible.
Component(s)
receiver/awscloudwatch
What happened?
Description
We have the AWS Cloudwatch Receiver setup to auto-discover and poll log groups from our AWS Account, to then be exported out to Kafka. The idea to use auto-discover was so that log groups can be added/removed automatically by the receiver, and not require manual intervention.
However we've noticed when a log group gets removed from AWS, this causes the receiver to panic and completely stop, as it's unable to find the log group (instead of ignoring this and continuing to poll the other log groups). It's as if the functionality to update log groups isn't removing deleted ones.
Steps to Reproduce
Setup the OTLP Collector to use the receiver with Auto-Discovery for log groups, wait 5/10 minutes with it running, then remove a log group from the AWS console.
Expected Result
The receiver to stop polling for logs in a group which no longer exists, and continue polling groups still active.
Actual Result
The receiver stops and continuously errors. The only way to stop this is to delete the pod and wait for the receiver to restart.
Collector version
0.101.0
Environment information
Environment
Kubernetes (EKS)
OpenTelemetry Collector configuration
Log output
Additional context
No response