kubernetes / ingress-nginx

Ingress-NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
16.95k stars 8.14k forks source link

Tolerations are not getting applied on Ingress Controllers #10758

Open tingerbell opened 6 months ago

tingerbell commented 6 months ago

Issue Overview I am encountering an issue where tolerations are not being applied as expected when installing the NGINX Ingress Controller via Helm on a Kubernetes cluster. This issue is observed during the execution of the Helm install command from a DevOps pipeline.

Environment Details Installed helm version : v3.12.3 K8s version on Dev : 1.27.7 NgINX helm chart version 4.7.1 nginz version:1.21.6

Helm Install Command The following Helm install command is used:

helm install $nginxReleaseName $nginxRepoName/$nginxChartName \ --version $version \ --create-namespace \ --namespace $nginxNamespace \ --set controller.replicaCount=2 \ --set controller.nodeSelector."kubernetes.io/os"=linux \ --set defaultBackend.nodeSelector."kubernetes.io/os"=linux \ --set controller.service.annotations."service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path"=/healthz \ --set controller.service.annotations."service.beta.kubernetes.io/azure-dns-label-name"=test \ --set controller.service.loadBalancerIP="xxxxxx" \ --set controller.service.annotations."service.beta.kubernetes.io/azure-load-balancer-resource-group"=$SharedResourceGroupName \ --set controller.tolerations[0].key=ingress,controller.tolerations[0].operator=Exists,controller.tolerations[0].effect=NoSchedule

Taint and Tolerations: Taint applied on Nodepool: ingress=true:NoSchedule Tolerations expected to be applied: tolerations:

Expected Behavior The NGINX Ingress Controller should be installed with the specified tolerations, allowing it to be scheduled on nodes with the corresponding taints.

Actual Behavior The tolerations are not to be applied.

Assistance Needed: I am seeking guidance on resolving this issue, ensuring that tolerations are correctly applied, allowing the NGINX Ingress Controller pods to be scheduled on the intended nodes.

k8s-ci-robot commented 6 months ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
tingerbell commented 6 months ago

Hi @strongjz - Good day!! Could you please guide this issue to appropriate team and help us to resolve it.

mamchyts commented 5 months ago

@tingerbell

tolerations - mean that pod CAN be assigned into node. but you can have many other valid nodes. so you must specify custom nodeSelector for you master node. In my case labels looks like:

kubectl describe node/dev-k8s-master-01
Name:               dev-k8s-master-01
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=dev-k8s-master-01
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node.kubernetes.io/exclude-from-external-load-balancers=

My solution:

helm --kube-context k8s-dev upgrade --install ingress-nginx ingress-nginx/ingress-nginx --version 4.9.0 \
--set controller.service.externalIPs={X.X.X.X} \
--set controller.admissionWebhooks.enabled=false \
--set controller.nodeSelector."kubernetes\.io/os"=linux \
--set controller.nodeSelector."node-role\.kubernetes\.io/control-plane"="" \
--set controller.tolerations[0].key="node-role.kubernetes.io/control-plane" \
--set controller.tolerations[0].operator=Exists \
--set controller.tolerations[0].effect=NoSchedule
JCBSLMN commented 1 month ago

I'm having same issue. Set tolerations in values file but pod will not schedule onto a spot node:

  tolerations: 
   - key: "kubernetes.azure.com/scalesetpriority"
     operator: "Equals"
     value: "spot"
     effect: "NoSchedule"

error message from Nginx pod:

message: '0/2 nodes are available: 1 node(s) had untolerated taint {CriticalAddonsOnly:
      true}, 1 node(s) had untolerated taint {kubernetes.azure.com/scalesetpriority:
      spot}. preemption: 0/2 nodes are available: 2 Preemption is not helpful for
      scheduling..'

Nginx pod tolerations:

 tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
SebSa commented 2 weeks ago

Yep, toleration value setting is not being applied to ingress-controller pods.

longwuyuan commented 2 weeks ago

What is the taint.

Almost none of the important questions asked in the new bug report template are answered in the issue description so there is no information to analyze and comment on.

You can edit the issue description and provide answers to the questions asked in a new bug report template.

/kind support /triage needs-information

SebSa commented 2 weeks ago

"CriticalAddonsOnly=true:NoSchedule" can be applied to default nodes in an AKS cluster, which are labelled as set aside for kubernetes services, I just want to be able to schedule controllers and their associated resoruces such as the admission-create jobs and webhooks on these nodes, but the helm chart is not consistent in it's application of the tolerations set in the value.yaml file at:

controller:
  nodeSelector:
    kubernetes.azure.com/mode: system
  tolerations:
  - effect: NoSchedule
    key: CriticalAddonsOnly
    operator: Equal
    value: "true"
longwuyuan commented 2 weeks ago

Your install process and commands used along with values file, exactly as used in real on the cluster is not visible. Your kubectl decribe output of the pod is not visible. Your kubectl get events -A is not visible.

So any comments will have to be based on whatever you have typed here. So harder to make comments as a guess when compared to making comments based on real live data from the state of the cluster and the resources on the cluster.

Fazu15 commented 1 week ago

@SebSa @JCBSLMN Is this issue got resolved?

I am also facing the same issue when I tried to deploy ingress controller into a particular node pool with taint kubernetes.azure.com/scalesetpriority:spot

I also tried passing the tolerance with helm install command and also tried passing it in values.yaml file but its not working. pod getting into pending state with error

Warning FailedScheduling 14m default-scheduler 0/4 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {kubernetes.azure.com/scalesetpriority: spot}. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.. I tried pulling the same helm chart in my local machine and hard coded the tolerations with

 tolerations:
   - key: "kubernetes.azure.com/scalesetpriority"
     operator: "Equal"
     value: "spot"
     effect: "NoSchedule"

and I installed the chart from local and its working as expected.

need some guidance on resolving this issue. Thanks,