Closed silviuchiric closed 1 month ago
![Uploading image.jpg…]()
The current build scenario relies on the goreleaser utility. To build a Docker image, you first need to build a binary. To do this, you must install go. After that, you can use the following command:
make build image=somedockeruser/somerepo
This will build a linux/amd64 binary and then push the Docker image to the specified repository.
Thank you Maksim, appreciate quick reply
Hello Manson
We finally deployed but the pods aks-node-termination-handler failed to start with ERROR Container has runAsNinRoot and image will run as root … Please see screenshot
![Uploading image.jpg…]()
Events:
Type Reason Age From Message
Normal Scheduled 28s default-scheduler Successfully assigned kube-system/aks-node-termination-handler-265zk to aks-platform2-41202490-vmss000006
Normal Pulled 28s kubelet Successfully pulled image "poc-container-registry.xxx.net/xxx/smartservices/images/aks-node-termination-handler:1.1-snapshot" in 204ms (204ms including waiting)
Normal Pulled 28s kubelet Successfully pulled image "poc-container-registry.xxx.net/xxx/smartservices/images/aks-node-termination-handler:1.1-snapshot" in 220ms (220ms including waiting)
Normal Pulling 3s (x4 over 28s) kubelet Pulling image "poc-container-registry.xxx.net/xxx/smartservices/images/aks-node-termination-handler:1.1-snapshot"
Warning Failed 2s (x4 over 28s) kubelet Error: container has runAsNonRoot and image will run as root (pod: "aks-node-termination-handler-265zk_kube-system(f11cfa14-34fa-4be3-a754-91e646783a3d)", container: aks-node-termination-handler)
Normal Pulled 2s (x2 over 14s) kubelet Successfully pulled image "poc-container-registry.xxx.net/xxx/smartservices/images/aks-node-termination-handler:1.1-snapshot" in 180ms (180ms including waiting)
Hi, I don't see any screenshots you made. It seems that problem in your Dockerfile, it don't have instruction USER
as original:
https://github.com/maksim-paskal/aks-node-termination-handler/blob/7ced51db99ca3f3c9362be3f22aecbd65817d095/Dockerfile#L11
You can also customize helm installation with some other not root user as below:
helm upgrade aks-node-termination-handler \
--install \
--namespace kube-system \
aks-node-termination-handler/aks-node-termination-handler \
--set priorityClassName=system-node-critical \
--set securityContext.runAsUser=1000
Prior to helm install shall I build from Dockerfile and push this image to our Nexus Repository ? This new build image shall I reference back into the values.yaml please ? On first line with key/tag image:
We can not deploy from GitHub , all images should go to internal Repo
I fixed it by building the Docker image , push it to internal Nexus Repo and running helm update
I see all pods and daemon set as Running now Thanks a lot
If you and your team are not familiar with Docker, Helm, and Kubernetes, I recommend periodically making a copy of the latest image to your private repository using Docker:
docker pull paskalmaksim/aks-node-termination-handler:latest
docker tag paskalmaksim/aks-node-termination-handler:latest somehost.com/some/repo:latest
docker push somehost.com/some/repo:latest
And install to your kubernetes cluster with Helm:
helm repo add aks-node-termination-handler https://maksim-paskal.github.io/aks-node-termination-handler/
helm repo update
helm upgrade aks-node-termination-handler \
--install \
--namespace kube-system \
aks-node-termination-handler/aks-node-termination-handler \
--set priorityClassName=system-node-critical \
--set image=somehost.com/some/repo:latest
Nexus Repository can make automatically copy of paskalmaksim/aks-node-termination-handler:latest
image to your internal repo with proxy feature:
https://help.sonatype.com/en/proxy-repository-for-docker.html
One last question Maskim please We want to get the events for this particular endpoint only: 2017-11-01 General Availability Added Support for Spot VM eviction EventType ‘Preempt’ That’s is published by Microsoft and documented , I copied and pasted the line for our interest
Where to change and how to redeploy or update for this Endpoint update in helmcharts please
kind regards Silviu Chiric
And the polling period shows up now as RequestTimeout 5000000000 Where is defined this time variables ? We simulated an eviction for one node but did not get the Eviction message in the logs
kubectl logs pod/aks-node-termination -n kube-system
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/web/web.go:42","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/web.Start","level":"info","msg":"web.address=:17923","time":"2024-05-22T11:50:19Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/events/events.go:70","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/events.(*Reader).ReadEvents","level":"info","msg":"Start reading events {\"Method\":\"GET\",\"Endpoint\":[http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01\](http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01%5C),\"RequestTimeout\":5000000000,\"Period\":5000000000,\"NodeName\":\"aks-spotcompute-18556904-vmss00001h\",\"AzureResource\":\"aks-spotcompute-18556904-vmss_53\"}","time":"2024-05-22T11:50:19Z"}
Preempt
event is now unavaible, this tool listen all events from Azure. Please create new issue and describe your problem, why you need to listen only one event - I will add this functionality.RequestTimeout
is maximum time (5 seconds) for wait answer from metadata endpoint. This tool read metadata endpoint every Period
(5 second). Thanky ou Maksim for above reply
Then how to test this service, the handler is working and getting the events? We found in MSFT docs the SPOT eviction simulation and do that simulation, see below, but got nothing in the pods logs. Actualy that events are static, since yestarday got recorded same ones, no updates
Anything wrong somewhere? I have asked MSFT Arhitect who recommended this service , waiting
How to get these messages, including SPOT nodes eviction?
Testing: [root@xdcf5d39771rlv4 aks-node-termination-handler]# POST https://management.azure.com/subscriptions/subscriptions/cf5d8b7e-bb50-409f-b0bc-de08f76ef1a6/resourceGroups/MC_risklab-aks-new_kdcf5d39771edev8_northeurope/providers/Microsoft.Compute/virtualMachineScaleSets/aks-spotcompute-18556904-vmss/43/simulateEviction?api-version=2021-11-01
Please enter content (application/x-www-form-urlencoded) to be POSTed:
This is a test to test of events are captured in the logs of pods handler
Checking logs:
[root@xdcf5d39771rlv4 ~]# kubectl get pods -n kube-system -owide|grep -i aks-node-termination-handler
aks-node-termination-handler-2zcqz 1/1 Running 0 19h 10.244.0.65 aks-spotcompute-18556904-vmss000017
aks-node-termination-handler-489st 1/1 Running 0 19h 10.244.14.33 aks-platform3-39502874-vmss000006
aks-node-termination-handler-5jmqf 1/1 Running 0 19h 10.244.11.44 aks-compute1-29396086-vmss00003r
aks-node-termination-handler-cdzb6 1/1 Running 0 19h 10.244.7.42 aks-platform3-39502874-vmss000007
aks-node-termination-handler-dqdj6 1/1 Running 0 19h 10.244.3.216 aks-platform1-25078549-vmss00000b
aks-node-termination-handler-fhn8m 1/1 Running 0 19h 10.244.4.184 aks-platform2-41202490-vmss000006
aks-node-termination-handler-p56hz 1/1 Running 0 19h 10.244.10.135 aks-platform1-25078549-vmss00000i
aks-node-termination-handler-xcxd8 1/1 Running 0 19h 10.244.1.159 aks-compute3-30344420-vmss00005v
[root@xdcf5d39771rlv4 ~]# kubectl logs aks-node-termination-handler-2zcqz -n kube-system
{"file":"github.com/maksim-paskal/aks-node-termination-handler/cmd/main.go:55","func":"main.main","level":"info","msg":"Starting 1.0.15-74dce44-1714558462...","time":"2024-05-22T11:50:19Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/alert/alert.go:29","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/alert.Init","level":"warning","msg":"not sending Telegram message, no token","time":"2024-05-22T11:50:19Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/client/client.go:45","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/client.Init","level":"info","msg":"No kubeconfig file use incluster","time":"2024-05-22T11:50:19Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/web/web.go:42","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/web.Start","level":"info","msg":"web.address=:17923","time":"2024-05-22T11:50:19Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/events/events.go:70","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/events.(*Reader).ReadEvents","level":"info","msg":"Start reading events {\"Method\":\"GET\",\"Endpoint\":[http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01\](http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01%5C),\"RequestTimeout\":5000000000,\"Period\":5000000000,\"NodeName\":\"aks-spotcompute-18556904-vmss000017\",\"AzureResource\":\"aks-spotcompute-18556904-vmss_43\"}","time":"2024-05-22T11:50:19Z"}
[root@xdcf5d39771rlv4 ~]# kubectl logs aks-node-termination-handler-489st -n kube-system
{"file":"github.com/maksim-paskal/aks-node-termination-handler/cmd/main.go:55","func":"main.main","level":"info","msg":"Starting 1.0.15-74dce44-1714558462...","time":"2024-05-22T11:50:19Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/alert/alert.go:29","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/alert.Init","level":"warning","msg":"not sending Telegram message, no token","time":"2024-05-22T11:50:19Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/client/client.go:45","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/client.Init","level":"info","msg":"No kubeconfig file use incluster","time":"2024-05-22T11:50:19Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/web/web.go:42","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/web.Start","level":"info","msg":"web.address=:17923","time":"2024-05-22T11:50:19Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/events/events.go:70","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/events.(*Reader).ReadEvents","level":"info","msg":"Start reading events {\"Method\":\"GET\",\"Endpoint\":[http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01\](http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01%5C),\"RequestTimeout\":5000000000,\"Period\":5000000000,\"NodeName\":\"aks-platform3-39502874-vmss000006\",\"AzureResource\":\"aks-platform3-39502874-vmss_6\"}","time":"2024-05-22T11:50:19Z"}
[root@xdcf5d39771rlv4 ~]# kubectl logs aks-node-termination-handler-xcxd8 -n kube-system
{"file":"github.com/maksim-paskal/aks-node-termination-handler/cmd/main.go:55","func":"main.main","level":"info","msg":"Starting 1.0.15-74dce44-1714558462...","time":"2024-05-22T11:50:20Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/alert/alert.go:29","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/alert.Init","level":"warning","msg":"not sending Telegram message, no token","time":"2024-05-22T11:50:20Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/client/client.go:45","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/client.Init","level":"info","msg":"No kubeconfig file use incluster","time":"2024-05-22T11:50:20Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/web/web.go:42","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/web.Start","level":"info","msg":"web.address=:17923","time":"2024-05-22T11:50:20Z"}
{"file":"github.com/maksim-paskal/aks-node-termination-handler/pkg/events/events.go:70","func":"github.com/maksim-paskal/aks-node-termination-handler/pkg/events.(*Reader).ReadEvents","level":"info","msg":"Start reading events {\"Method\":\"GET\",\"Endpoint\":[http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01\](http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01%5C),\"RequestTimeout\":5000000000,\"Period\":5000000000,\"NodeName\":\"aks-compute3-30344420-vmss00005v\",\"AzureResource\":\"aks-compute3-30344420-vmss_211\"}","time":"2024-05-22T11:50:20Z"}
try to simulate node eviction with Azure CLI
Hello Maksim
It’s working indeed,
How we can change this notification from 5 seconds to 1 second only please
Kind regards Silviu Chiric
In our production clusters Azure endpoint sometime can't answer to this request quickly (1s) - it's recomended to be 5s - but if you want - try install this tool with:
helm upgrade aks-node-termination-handler \
--install \
--namespace kube-system \
aks-node-termination-handler/aks-node-termination-handler \
--set priorityClassName=system-node-critical \
--set 'args[0]=-period=1s'
Hello Maksim
we tried to build from Docker file but it fails with
lstat aks-node-termination-handler: no such file or directory