Closed kubebn closed 5 months ago
@kubebn Changes was released, please try to run this pods on Windows nodes (you need to reinstall the stable chart)
Hi @maksim-paskal , today I was planning to test Windows spot instances. However, I am getting confused with configuration.
I have these values:
image: paskalmaksim/aks-node-termination-handler:v1.0.12
# imagePullPolicy: Always
priorityClassName: system-node-critical
securityContext:
runAsNonRoot: true
privileged: false
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
windowsOptions:
runAsUserName: "ContainerUser"
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
- effect: NoSchedule
key: windows
operator: Equal
value: "true"
I am getting:
aks-node-termination-handler-4mbxh 1/1 Running 0 21h 10.61.118.114 aks-lmd8spot1e4d-83777213-vmss00003j <none> <none>
aks-node-termination-handler-4ncjp 0/1 ErrImagePull 0 113s 10.61.66.36 aksw8s3e400001n <none> <none>
aks-node-termination-handler-4nl89 1/1 Running 0 21h 10.61.116.237 aks-lmd8spot3e4d-27803674-vmss000035 <none> <none>
aks-node-termination-handler-57rm6 0/1 ErrImagePull 0 113s 10.61.115.207 aksw8s2e400001o <none> <none>
aks-node-termination-handler-58bg4 0/1 ErrImagePull 0 114s 10.61.102.177 aksw8s3e4000021 <none> <none>
---
k describe pod aks-node-termination-handler-57rm6
...
Containers:
aks-node-termination-handler:
Container ID:
Image: paskalmaksim/aks-node-termination-handler:v1.0.12
Image ID:
Port: 17923/TCP
Host Port: 0/TCP
...
Normal Pulling 49s (x4 over 2m19s) kubelet Pulling image "paskalmaksim/aks-node-termination-handler:v1.0.12"
Warning Failed 49s (x4 over 2m19s) kubelet Error: ErrImagePull
Normal BackOff 23s (x7 over 2m19s) kubelet Back-off pulling image "paskalmaksim/aks-node-termination-handler:v1.0.12"
Is there anything else needs to be added so it can read Windows image manifest correctly?
@kubebn in production we don't have any Windows server, for my test I create simple cluster with Windows and Linux nodes, see README
Your logs doesn't have any reason why it not pull image, maybe your Windows nodes have some specific network settings, or it's some specific instance error....
Please try to create new AKS cluster (see README) and try to install aks-node-termination-handler
in this cluster with default helm chart settings:
helm upgrade aks-node-termination-handler \
--install \
--namespace kube-system \
aks-node-termination-handler/aks-node-termination-handler \
--set priorityClassName=system-node-critical
and than install chart with your own values.yaml
I tried to install it straight using windows images: paskalmaksim/aks-node-termination-handler:v1.0.12-windows-amd64
Got this error message:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: StartError
Message: failed to create containerd task: failed to create shim task: hcs::CreateComputeSystem 182f8aea9ccc95e5b750a40a9e3a63bf0188ed4f2bfaff499a3052b25bfe4265: The container operating system does not match the host operating system.: unknown
Exit Code: 128
Is it actually compatible with Windows 2019?
OS Image: Windows Server 2019 Datacenter
Operating System: windows
Architecture: amd64
@kubebn it's some kubernetes windows specific error, more info here it means that docker image that build for Windows 2022 can't start on Windows 2019, and vice versa, it can be fixed only with different docker images for specific Windows version.
I see that AKS clusters have Windows 2022 by default
Windows Server 2022 is the default operating system for Kubernetes versions 1.25.0 and higher. Windows Server 2019 is the default OS for earlier versions.
I build test images for your test, you can change image for your pods to check if it close your issues:
Windows 2022: paskalmaksim/aks-node-termination-handler:test-7772698645-windows-ltsc2022-amd64
Windows 2019: paskalmaksim/aks-node-termination-handler:test-7772698645-windows-ltsc2019-amd64
Can you migrate your workflows from Windows 2019 to Windows 2022?
What Operation Systems you cluster have (Linux + Windows 2019
or Linux + Windows 2019 + Windows 2022
) ?
Hi, yes we are aware that 2019 will be deprecated soon but unfortunately can’t migrate all of them now.
I will try those images on Monday, I guess I will just create two daemonsets for diff versions.
We have Linux and both Windows versions.
There is more elegant way to run pods in your landscape Linux + Windows 2019 + Windows 2022
- you need two installation of aks-node-termination-handler
:
values.yaml
of first installation (exclude Windows 2019 nodes)
priorityClassName: system-node-critical
image: paskalmaksim/aks-node-termination-handler:latest
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.azure.com/os-sku
operator: NotIn
values:
- Windows2019
values.yaml
of second installation (only Windows 2019 nodes)
priorityClassName: system-node-critical
image: paskalmaksim/aks-node-termination-handler:latest-ltsc2019
nodeSelector:
kubernetes.azure.com/os-sku: Windows2019
It's my proof of concept for new release, I try to implement this on this week
@kubebn Windows 2019 now has support, see readme
Hello,
We've been lucky so far while using AWS and aws-handler does support Windows nodes.
We do have some Windows Nodepools running in the AKS therefore, I am wondering if there are any plans for Windows support? Thanks