newrelic / helm-charts

Helm charts for New Relic applications
Apache License 2.0
95 stars 201 forks source link

[newrelic-logging] kubelet upstream connection error #1397

Open Voziv opened 2 weeks ago

Voziv commented 2 weeks ago

Bug description

1207 broke the ability to communicate to the kubelet Use_Kubelet

Version of Helm and Kubernetes

❯ kubectl version
Client Version: v1.28.10
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.10-gke.1058000

❯ helm version
version.BuildInfo{Version:"v3.9.0", GitCommit:"7ceeda6c585217a19a1131663d8cd1f7d641b2a7", GitTreeState:"clean", GoVersion:"go1.17.5"}

Which chart?

nri-bundle 5.0.81 newrelic-logging 1.22.0

What happened?

I upgraded from nri-bundle 5.0.25 to nri-bundle 5.0.81 to mitigate the CVE that we were notified of.

We stopped recieving kubernetes metadata on our logs and started seeing a bunch of [error] [filter:kubernetes:kubernetes.0] kubelet upstream connection error messages from the newrelic logging pods.

What you expected to happen?

I expected fluentbit to continue to communicate with kubelet.

How to reproduce it?

Add Use_Kubelet On to the fluentbit kubernetes filter.

Observe that the pods will no longer connect to the kubelet

Anything else we need to know?

I did notice that hostNetwork: true was removed in #1207. This means that fluentbit was no longer able to communicate with the kubelet.

I did find a promising solution here that could possibly be applied to this chart: https://github.com/aws-samples/amazon-cloudwatch-container-insights/issues/147

[FILTER]
    Name          kubernetes
    Match         kube.*
    Use_Kubelet   true
    Kubelet_Host  ${KUBELET_HOST}
apiVersion: apps/v1
kind: DaemonSet
metadata:
 name: fluent-bit
spec:
 template:
   spec:
     hostNetwork: false
     containers:
     - name: fluent-bit
        image: fluent/fluent-bit
        env:
        - name: KUBELET_HOST
           valueFrom:
             fieldRef:
               fieldPath: status.hostIP

Additionally, from what I understand, connecting to the kublet helps alleviate load on the main api as it does some caching and it can be a local call. Would it be a good idea to make this the default?

workato-integration[bot] commented 2 weeks ago

https://new-relic.atlassian.net/browse/NR-280975