Open tingerbell opened 11 months ago
This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Hi @strongjz - Good day!! Could you please guide this issue to appropriate team and help us to resolve it.
@tingerbell
tolerations - mean that pod CAN be assigned into node. but you can have many other valid nodes. so you must specify custom nodeSelector
for you master node. In my case labels looks like:
kubectl describe node/dev-k8s-master-01
Name: dev-k8s-master-01
Roles: control-plane
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=dev-k8s-master-01
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node.kubernetes.io/exclude-from-external-load-balancers=
My solution:
helm --kube-context k8s-dev upgrade --install ingress-nginx ingress-nginx/ingress-nginx --version 4.9.0 \
--set controller.service.externalIPs={X.X.X.X} \
--set controller.admissionWebhooks.enabled=false \
--set controller.nodeSelector."kubernetes\.io/os"=linux \
--set controller.nodeSelector."node-role\.kubernetes\.io/control-plane"="" \
--set controller.tolerations[0].key="node-role.kubernetes.io/control-plane" \
--set controller.tolerations[0].operator=Exists \
--set controller.tolerations[0].effect=NoSchedule
I'm having same issue. Set tolerations in values file but pod will not schedule onto a spot node:
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equals"
value: "spot"
effect: "NoSchedule"
error message from Nginx pod:
message: '0/2 nodes are available: 1 node(s) had untolerated taint {CriticalAddonsOnly:
true}, 1 node(s) had untolerated taint {kubernetes.azure.com/scalesetpriority:
spot}. preemption: 0/2 nodes are available: 2 Preemption is not helpful for
scheduling..'
Nginx pod tolerations:
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
Yep, toleration value setting is not being applied to ingress-controller pods.
What is the taint.
Almost none of the important questions asked in the new bug report template are answered in the issue description so there is no information to analyze and comment on.
You can edit the issue description and provide answers to the questions asked in a new bug report template.
/kind support /triage needs-information
"CriticalAddonsOnly=true:NoSchedule" can be applied to default nodes in an AKS cluster, which are labelled as set aside for kubernetes services, I just want to be able to schedule controllers and their associated resoruces such as the admission-create jobs and webhooks on these nodes, but the helm chart is not consistent in it's application of the tolerations set in the value.yaml file at:
controller:
nodeSelector:
kubernetes.azure.com/mode: system
tolerations:
- effect: NoSchedule
key: CriticalAddonsOnly
operator: Equal
value: "true"
Your install process and commands used along with values file, exactly as used in real on the cluster is not visible.
Your kubectl decribe
output of the pod is not visible.
Your kubectl get events -A
is not visible.
So any comments will have to be based on whatever you have typed here. So harder to make comments as a guess when compared to making comments based on real live data from the state of the cluster and the resources on the cluster.
@SebSa @JCBSLMN Is this issue got resolved?
I am also facing the same issue when I tried to deploy ingress controller into a particular node pool with taint kubernetes.azure.com/scalesetpriority:spot
I also tried passing the tolerance with helm install command and also tried passing it in values.yaml file but its not working. pod getting into pending state with error
Warning FailedScheduling 14m default-scheduler 0/4 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {kubernetes.azure.com/scalesetpriority: spot}. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..
I tried pulling the same helm chart in my local machine and hard coded the tolerations with
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
and I installed the chart from local and its working as expected.
need some guidance on resolving this issue. Thanks,
I'm facing with this issue too, any updates here ?
The information detail getting posted here does not seem like a complete set needed to reproduce the issue. For example ;
when the error message is clearly indicating that there are no nodes available as per configured taints/tolerations, I don't see how the ingress-controller code causes the controller pod to fail scheduling.
Maybe a simple pod using the image httpd:alpine should be created kubectl create deploy httpd --image httpd:alpine --port 80
and then edit it for the same scheduling that you are trying for the controller pod.
Sure, basically I have the following nodes
kubectl get nodes -o json | jq '.items[].spec.taints'
[
{
"effect": "NoExecute",
"key": "CriticalAddonsOnly",
"value": "true"
}
]
[
{
"effect": "NoExecute",
"key": "Vault",
"value": "true"
}
]
[
{
"effect": "NoExecute",
"key": "Vault",
"value": "true"
}
]
[
{
"effect": "NoExecute",
"key": "Vault",
"value": "true"
}
]
[
{
"effect": "NoExecute",
"key": "CriticalAddonsOnly",
"value": "true"
}
]
[
{
"effect": "NoExecute",
"key": "Vault",
"value": "true"
}
]
[
{
"effect": "NoExecute",
"key": "Vault",
"value": "true"
}
]
I'm trying to deploy nginx-ingress-controller
using terraform with
+ resource "helm_release" "this" {
+ atomic = false
+ chart = "ingress-nginx"
+ cleanup_on_fail = false
+ create_namespace = true
+ dependency_update = false
+ disable_crd_hooks = false
+ disable_openapi_validation = false
+ disable_webhooks = false
+ force_update = false
+ id = (known after apply)
+ lint = false
+ manifest = (known after apply)
+ max_history = 0
+ metadata = (known after apply)
+ name = "ingress-nginx"
+ namespace = "kube-system"
+ pass_credentials = false
+ recreate_pods = false
+ render_subchart_notes = true
+ replace = false
+ repository = "https://kubernetes.github.io/ingress-nginx"
+ reset_values = false
+ reuse_values = false
+ skip_crds = false
+ status = "deployed"
+ timeout = 300
+ values = [
+ <<-EOT
"controller":
"admissionWebhooks":
"path":
"tolerations":
- "effect": "NoExecute"
"key": "CriticalAddonsOnly"
"operator": "Equal"
"value": "true"
"service":
"internal":
"annotations":
"external-dns.alpha.kubernetes.io/hostname": "vault.internal.services.local."
"service.beta.kubernetes.io/aws-load-balancer-scheme": "internal"
"enabled": true
"tolerations":
- "effect": "NoExecute"
"key": "CriticalAddonsOnly"
"operator": "Equal"
"value": "true"
EOT,
]
+ verify = false
+ version = "4.11.2"
+ wait = true
+ wait_for_jobs = false
}
But apparently , this pod is stuck in pending state due lack of the proper tolerations
kubectl get po ingress-nginx-admission-create-g65dk -n kube-system -oyaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2024-09-17T15:46:41Z"
finalizers:
- batch.kubernetes.io/job-tracking
generateName: ingress-nginx-admission-create-
labels:
app.kubernetes.io/component: admission-webhook
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.11.2
batch.kubernetes.io/controller-uid: 368c9af7-74c2-43ed-bebe-1330824a808f
batch.kubernetes.io/job-name: ingress-nginx-admission-create
controller-uid: 368c9af7-74c2-43ed-bebe-1330824a808f
helm.sh/chart: ingress-nginx-4.11.2
job-name: ingress-nginx-admission-create
name: ingress-nginx-admission-create-g65dk
namespace: kube-system
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: Job
name: ingress-nginx-admission-create
uid: 368c9af7-74c2-43ed-bebe-1330824a808f
resourceVersion: "2429971"
uid: ee543802-aff9-45a7-a1d4-ea7c918294cf
spec:
containers:
- args:
- create
- --host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.$(POD_NAMESPACE).svc
- --namespace=$(POD_NAMESPACE)
- --secret-name=ingress-nginx-admission
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.4.3@sha256:a320a50cc91bd15fd2d6fa6de58bd98c1bd64b9a6f926ce23a600d87043455a3
imagePullPolicy: IfNotPresent
name: create
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 65532
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-hr782
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeSelector:
kubernetes.io/os: linux
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext: {}
serviceAccount: ingress-nginx-admission
serviceAccountName: ingress-nginx-admission
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-hr782
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-09-17T15:46:41Z"
message: '0/7 nodes are available: 2 node(s) had untolerated taint {CriticalAddonsOnly:
true}, 5 node(s) had untolerated taint {Vault: true}. preemption: 0/7 nodes
are available: 7 Preemption is not helpful for scheduling.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: BestEffort
After manually edited these two pods, nginx was abbe to bring up
kubectl edit po ingress-nginx-admission-patch-qfcpq -n kube-system
pod/ingress-nginx-admission-patch-qfcpq edited
kubectl edit po ingress-nginx-admission-create-g65dk -n kube-system
pod/ingress-nginx-admission-create-g65dk edited
I don't see any possibility to setup toleration's for these resources, please advice.
Its not tested in the CI so have to do the tests manually to know more. You could be setting tolerations on a small pod manaully like i wrote earier first, to check.
On another note, you already know what toleration you need and you already see the toleration for the pod that you posted yaml for, and you can see clearly that you may not be having the toleration there so sonder what really is going on
I don't see the taint ""CriticalAddonsOnly"
you say that you want
Yes there no any tolerations applied to these 2 pods, but when I deployed using terraform I specified these tolerations, I applied them everywhere except defaultBackend resource, on artifacthub.io I don't see any possibility to specify those tolerations to these pods ingress-nginx-admission-patch, ingress-nginx-admission-create
, my understanding was that this pods should respect tolerations during deploying helm chart as well.
"controller":
"admissionWebhooks":
"path":
"tolerations":
- "effect": "NoExecute"
"key": "CriticalAddonsOnly"
"operator": "Equal"
"value": "true"
"service":
"internal":
"annotations":
"external-dns.alpha.kubernetes.io/hostname": "vault.internal.services.local."
"service.beta.kubernetes.io/aws-load-balancer-scheme": "internal"
"enabled": true
"tolerations":
- "effect": "NoExecute"
"key": "CriticalAddonsOnly"
"operator": "Equal"
"value": "true"
still fails (AWS EKS) in version 4.11.3 (8 Oct, 2024) of the Helm chart. My values:
controller:
hostNetwork: true
replicaCount: 1
ingressClass: xxx
ingressClassResource:
name: xxx
controllerValue: k8s.io/xxx
tolerations:
- key: xxx
value: yyy
effect: NoSchedule
Issue Overview I am encountering an issue where tolerations are not being applied as expected when installing the NGINX Ingress Controller via Helm on a Kubernetes cluster. This issue is observed during the execution of the Helm install command from a DevOps pipeline.
Environment Details Installed helm version : v3.12.3 K8s version on Dev : 1.27.7 NgINX helm chart version 4.7.1 nginz version:1.21.6
Helm Install Command The following Helm install command is used:
helm install $nginxReleaseName $nginxRepoName/$nginxChartName \ --version $version \ --create-namespace \ --namespace $nginxNamespace \ --set controller.replicaCount=2 \ --set controller.nodeSelector."kubernetes.io/os"=linux \ --set defaultBackend.nodeSelector."kubernetes.io/os"=linux \ --set controller.service.annotations."service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path"=/healthz \ --set controller.service.annotations."service.beta.kubernetes.io/azure-dns-label-name"=test \ --set controller.service.loadBalancerIP="xxxxxx" \ --set controller.service.annotations."service.beta.kubernetes.io/azure-load-balancer-resource-group"=$SharedResourceGroupName \ --set controller.tolerations[0].key=ingress,controller.tolerations[0].operator=Exists,controller.tolerations[0].effect=NoSchedule
Taint and Tolerations: Taint applied on Nodepool: ingress=true:NoSchedule Tolerations expected to be applied: tolerations:
Expected Behavior The NGINX Ingress Controller should be installed with the specified tolerations, allowing it to be scheduled on nodes with the corresponding taints.
Actual Behavior The tolerations are not to be applied.
Assistance Needed: I am seeking guidance on resolving this issue, ensuring that tolerations are correctly applied, allowing the NGINX Ingress Controller pods to be scheduled on the intended nodes.