Open cvallesi-kainos opened 1 year ago
/remove-kind bug /triage needs-information
show helm get values <helmreleasename>
Sure:
USER-SUPPLIED VALUES:
controller:
autoscaling:
enabled: true
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev
on Kubernetes Slack.
We have the same problem with our nginx-ingress deployment on AKS (however we use Standard_B2ms machines). I wonder if the autoscaling feature necessarily needs resource limits to be set so it can evaluate what exactly 50% or 80% of cpu/memory usage is.
values.yaml:
controller:
service:
externalTrafficPolicy: Local
annotations:
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: "/healthz"
extraArgs:
enable-ssl-passthrough: "" # Needed for Coturn SSL forwarding
allowSnippetAnnotations: true # Needed for Jitsi Web /config.js block
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 4
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
Has this been fixed? I deployed nginx via helm a few days ago with this config:
set {
name = "controller.autoscaling.minReplicas"
value = "1"
}
set {
name = "controller.autoscaling.maxReplicas"
value = "2"
}
And only one pod was created. In another cluster when I've done this in the past it created two.
/remove-triage needs-information /kind bug /triage accepted
/remove-lifecycle frozen
Hi, controller in idle state use ~60/120Mi of memory when you got
Limits:
cpu: 100m
memory: 192Mi
Requests:
cpu: 100m
memory: 128Mi
and enabled autoscaling with default 50% targetMemoryUtilizationPercentage hpa will always scale up pods to maxReplicas
What happened:
Autoscaling seems to scale to maximum capacity as soon as the ingress controller is deployed.
What you expected to happen:
Not seeing the ingress scale immediately.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
NGINX Ingress controller Release: v1.8.1 Build: dc88dce9ea5e700f3301d16f971fa17c6cfe757d Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.21.6
Kubernetes version (use
kubectl version
):Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:21:19Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v5.0.1 Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.6", GitCommit:"94c50547e633f1db5d4c56b2b305670e14987d59", GitTreeState:"clean", BuildDate:"2023-06-12T18:46:30Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}
Environment:
uname -a
): Linux nginx-ingress-ingress-nginx-controller-7c9f44b5f8-z5z24 5.15.0-1040-azure #47-Ubuntu SMP Thu Jun 1 19:38:24 UTC 2023 x86_64 Linuxkubectl version
: See abovekubectl get nodes -o wide
:helm ls -A | grep -i ingress
kubectl describe ingressclasses
kubectl -n <ingresscontrollernamespace> get all -A -o wide
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
kubectl -n <appnnamespace> get all,ing -o wide
kubectl -n <appnamespace> describe ing <ingressname>
kubectl describe ...
of any custom configmap(s) created and in useHow to reproduce this issue:
This issue has been tested and reproducible 100% of the time on Azure Kubernetes Service
First step would be do deploy an AKS service with nodes Standard_B8ms or higher. Lower class nodes don't seem to have this problem.
Simply enable scaling when installing the chart and you should see the behaviour reported.
helm install nginx-ingress ingress-nginx/ingress-nginx --set controller.autoscaling.enabled=true
Anything else we need to know:
I encountered this anomaly on a cluster with B12ms VMs used as nodes and started testing possible causes. Noticed that up until B4ms this does not happen. I can't figure out why the exact same configuration misbehave on nodes with VM with more memory available but what seems to happen is that the pod is deployed and as soon as I can get some metrics out, it's RAM usage is > 80% which trigger the deployment of new replicas.
The initial cluster I have made aware of the issue had only nginx, cert-manager, grafana, prometheus and loki deployed. After some consideration and experiments I deployed a new cluster from scratch and deployed only nginx-ingress in it via helm, the behaviour kept happening in the same way.
I tried increasing/lowering maxReplica and it always deploys all available replicas. I also tried enabling the explicit scaleUp and scaleDown policies in the chart, still, it only increases the amount of replicas and never scale them down.
During my tests I also tried using the two previous version of the helm chart (corresponding to the app versions 1.7.1 and 1.8.0) and the behaviour was the same.
If someone can check if it's happening with other cloud providers when using similar hardware for the nodes, it could help understanding if I need instead to go to Microsoft asking for clarifications.