Open hero-david opened 1 month ago
@hero-david Did you have any luck resolving this? We're seeing the same problem with AKS
@hero-david Did you have any luck resolving this? We're seeing the same problem with AKS
No, we have simply upped our VM SKU to 16GB (Required for some of our workloads moving forwards anyway)
Description
An issue has been opened about this before, and the reporter was instructed to ensure that they had upgraded their chart such that memory limit config on the input was present.
https://github.com/newrelic/helm-charts/blob/ab2d1bab9f09d94ea6ca56fed807dd20eae5444e/charts/newrelic-logging/values.yaml#L104
We have been struggling with OOM errors and restarts on our pods despite having this config present, and upping the memory allowances of the pod. We have about 50 pods per node.
The helm config provided for this was:
Versions
Helm v3.14.4 Kubernetes (AKS) 1.29.2 Chart: nri-bundle-5.0.81 FluentBit: newrelic/newrelic-fluentbit-output:2.0.0
What happened?
The fluentbit pods were repeatedly killed for using more memory than it's limit, which is set very low. It's CPU was never highly utilised, which does not suggest that the memory increase was due to throttling / not being able to keep up.
What you expected to happen?
The fluentbit should have little to no restarts, and it should never reach 1.5GB of memory used per container.
How to reproduce it?
Using the same versions as listed above, and the same helm values.yaml, deploy an AKS cluster with 50 production workloads per node (2vcpu 8gb) and observe whether there are memory issues.