Upstream Connection Error in Logging agent

newrelic / helm-charts

Helm charts for New Relic applications

Apache License 2.0

97 stars 209 forks source link

Upstream Connection Error in Logging agent #641

Open Vacant0mens opened 2 years ago

Vacant0mens commented 2 years ago

Our logging daemon set isn't pulling logs from k8s, it's just reporting this error to NR Logger over and over.

[ warn] [net] getaddrinfo(host='kubernetes.default.svc.cluster.local', err=-2): Name or service not known
[error] [filter:kubernetes:kubernetes.0] kubelet upstream connection error

it seems to be reporting this error a few times per-second per-instance (in our test environment it's 3-4 instances per-cluster).

Not sure what to do other than delete the DaemonSet to avoid our logs getting inundated with millions of these.

Is this a bug? or is there a permission that I missed?

Vacant0mens commented 2 years ago

I was able to resolve this.

I didn't realize we had a custom DNS resolver/host name for our cluster. Rather than .svc.cluster.local it was appending our custom DNS name, so kubernetes.default resolved correctly, but the FQDN kubernetes.default.svc.cluster.local was unknown and did not resolve.

Perhaps there can be a setting added to the chart for this?

Also, our logging ingest got spammed like crazy with this error (like, millions of logs in an hour or two). Should the logging service be logging itself to NR? It seems like it should self-contain those logs and errors rather than blasting the ingest with them, at least by default.

sa1i commented 2 years ago

same error here I am using cloud dns to replace default dns but newrelic logging not working with this well

currently, I drop the logs with pattern

cluster_name:"xxxx" missing:"namespace_name"

But I'm really worried if I'm missing some important logs for this drop filter