Closed anon-software closed 9 months ago
I can't reproduce this.
brandond@dev01:~$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3s-server-1 Ready control-plane,master 6m36s v1.26.4+k3s1 172.17.0.4 <none> K3s dev 5.19.0-1019-aws containerd://1.6.19-k3s1
brandond@dev01:~$ kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
namespace/system-upgrade created
serviceaccount/system-upgrade created
clusterrolebinding.rbac.authorization.k8s.io/system-upgrade created
configmap/default-controller-env created
deployment.apps/system-upgrade-controller created
brandond@dev01:~$ kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-76d776f6f9-fhdmj 1/1 Running 0 26s
kube-system coredns-59b4f5bbd5-p4vwz 1/1 Running 0 26s
kube-system svclb-traefik-6c3a9382-qfzfc 2/2 Running 0 19s
kube-system helm-install-traefik-crd-4q2kw 0/1 Completed 0 27s
kube-system helm-install-traefik-rsx42 0/1 Completed 1 27s
kube-system traefik-56b8c5fb5c-7sdkv 1/1 Running 0 19s
kube-system metrics-server-7b67f64457-5685l 1/1 Running 0 26s
system-upgrade system-upgrade-controller-5876667756-2ppqw 1/1 Running 0 8s
brandond@dev01:~$ kubectl apply -f -
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: server-plan
namespace: system-upgrade
spec:
concurrency: 1
cordon: true
channel: https://update.k3s.io/v1-release/channels/stable
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: In
values:
- "true"
serviceAccountName: system-upgrade
upgrade:
image: rancher/k3s-upgrade
plan.upgrade.cattle.io/server-plan created
brandond@dev01:~/go/src/github.com/k3s-io/k3s$ kubectl get job -n system-upgrade
NAME COMPLETIONS DURATION AGE
apply-server-plan-on-k3s-server-1-with-0e4e3f4e3f8b1e811d-f6e12 0/1 48s 48s
Do you perhaps have something else deployed to your cluster that's blocking creation of the upgrade job? Have you tried increasing the verbosity of the system-upgrade-controller?
I have got a bunch of stuff running, but I do not know if any of that would interfere with the upgrade. I have turned on the debug level logging, but I still do not see anything relevant there:
W0926 03:06:33.543448 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
time="2023-09-26T03:06:33Z" level=info msg="Applying CRD plans.upgrade.cattle.io" func="github.com/rancher/wrangler/pkg/crd.(*Factory).createCRD" file="/go/pkg/mod/github.com/rancher/wrangler@v1.1.1-0.20230425173236-39a4707f0689/pkg/crd/init.go:543"
time="2023-09-26T03:06:33Z" level=debug msg="DesiredSet - Patch apiextensions.k8s.io/v1, Kind=CustomResourceDefinition /plans.upgrade.cattle.io for plans.upgrade.cattle.io -- [ placeholder-for-stuff-dropped ]" func=github.com/rancher/wrangler/pkg/apply.applyPatch file="/go/pkg/mod/github.com/rancher/wrangler@v1.1.1-0.20230425173236-39a4707f0689/pkg/apply/desiredset_compare.go:210"
time="2023-09-26T03:06:33Z" level=debug msg="DesiredSet - Updated apiextensions.k8s.io/v1, Kind=CustomResourceDefinition /plans.upgrade.cattle.io for plans.upgrade.cattle.io -- application/merge-patch+json {\"metadata\":{},\"spec\":{\"preserveUnknownFields\":false}}" func=github.com/rancher/wrangler/pkg/apply.applyPatch file="/go/pkg/mod/github.com/rancher/wrangler@v1.1.1-0.20230425173236-39a4707f0689/pkg/apply/desiredset_compare.go:232"
time="2023-09-26T03:06:34Z" level=info msg="Starting /v1, Kind=Node controller" func="github.com/rancher/lasso/pkg/controller.(*controller).run" file="/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20221227210133-6ea88ca2fbcc/pkg/controller/controller.go:144"
time="2023-09-26T03:06:34Z" level=info msg="Starting /v1, Kind=Secret controller" func="github.com/rancher/lasso/pkg/controller.(*controller).run" file="/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20221227210133-6ea88ca2fbcc/pkg/controller/controller.go:144"
time="2023-09-26T03:06:34Z" level=info msg="Starting batch/v1, Kind=Job controller" func="github.com/rancher/lasso/pkg/controller.(*controller).run" file="/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20221227210133-6ea88ca2fbcc/pkg/controller/controller.go:144"
time="2023-09-26T03:06:34Z" level=info msg="Starting upgrade.cattle.io/v1, Kind=Plan controller" func="github.com/rancher/lasso/pkg/controller.(*controller).run" file="/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20221227210133-6ea88ca2fbcc/pkg/controller/controller.go:144"
time="2023-09-26T03:06:34Z" level=debug msg="PLAN STATUS HANDLER: plan=system-upgrade/server-plan@62920622, status={Conditions:[{Type:Validated Status:True LastUpdateTime:2023-09-26T03:04:13Z LastTransitionTime: Reason:PlanIsValid Message:} {Type:LatestResolved Status:True LastUpdateTime:2023-09-26T02:59:00Z LastTransitionTime: Reason:Channel Message:}] LatestVersion:v1.27.6-k3s1 LatestHash:0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c Applying:[]}" func="github.com/rancher/system-upgrade-controller/pkg/upgrade.(*Controller).handlePlans.func1" file="/go/src/github.com/rancher/system-upgrade-controller/pkg/upgrade/handle_upgrade.go:30"
time="2023-09-26T03:06:34Z" level=debug msg="PLAN GENERATING HANDLER: plan=system-upgrade/server-plan@62921428, status={Conditions:[{Type:Validated Status:True LastUpdateTime:2023-09-26T03:06:34Z LastTransitionTime: Reason:PlanIsValid Message:} {Type:LatestResolved Status:True LastUpdateTime:2023-09-26T02:59:00Z LastTransitionTime: Reason:Channel Message:}] LatestVersion:v1.27.6-k3s1 LatestHash:0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c Applying:[]}" func="github.com/rancher/system-upgrade-controller/pkg/upgrade.(*Controller).handlePlans.func2" file="/go/src/github.com/rancher/system-upgrade-controller/pkg/upgrade/handle_upgrade.go:78"
time="2023-09-26T03:06:34Z" level=debug msg="PLAN STATUS HANDLER: plan=system-upgrade/server-plan@62921428, status={Conditions:[{Type:Validated Status:True LastUpdateTime:2023-09-26T03:06:34Z LastTransitionTime: Reason:PlanIsValid Message:} {Type:LatestResolved Status:True LastUpdateTime:2023-09-26T02:59:00Z LastTransitionTime: Reason:Channel Message:}] LatestVersion:v1.27.6-k3s1 LatestHash:0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c Applying:[]}" func="github.com/rancher/system-upgrade-controller/pkg/upgrade.(*Controller).handlePlans.func1" file="/go/src/github.com/rancher/system-upgrade-controller/pkg/upgrade/handle_upgrade.go:30"
time="2023-09-26T03:06:34Z" level=debug msg="PLAN GENERATING HANDLER: plan=system-upgrade/server-plan@62921428, status={Conditions:[{Type:Validated Status:True LastUpdateTime:2023-09-26T03:06:34Z LastTransitionTime: Reason:PlanIsValid Message:} {Type:LatestResolved Status:True LastUpdateTime:2023-09-26T02:59:00Z LastTransitionTime: Reason:Channel Message:}] LatestVersion:v1.27.6-k3s1 LatestHash:0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c Applying:[]}" func="github.com/rancher/system-upgrade-controller/pkg/upgrade.(*Controller).handlePlans.func2" file="/go/src/github.com/rancher/system-upgrade-controller/pkg/upgrade/handle_upgrade.go:78"
Can you show the output of kubectl get node turing-node-1 -o yaml
?
It looks like for some reason the node selector isn't finding any nodes to create jobs for... https://github.com/rancher/system-upgrade-controller/blob/04a0b9ef5858657f20949cd022e58ad19de029df/pkg/upgrade/plan/plan.go#L168-L172
I tested the selector by using it in kubectl command to filter the node, you can see the command in my original post. In any case, here is the requested output:
$ kubectl get node turing-node-1 -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
etcd.k3s.cattle.io/node-address: 192.168.2.253
etcd.k3s.cattle.io/node-name: turing-node-1-15e70d0d
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"f6:51:16:0b:a0:13"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 192.168.2.253
k3s.io/hostname: turing-node-1
k3s.io/internal-ip: 192.168.2.253,2600:1700:38c2:8a10::41
k3s.io/node-args: '["server","--cluster-init"]'
k3s.io/node-config-hash: XFJS3VE5KBQGMQZO4QOV2XN233KRIPYZWEZD3YOPYBRFV6NHLY3A====
k3s.io/node-env: '{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/4b147cafa965066cd68e04b4e3acce221078156a3b9ba635a653517ce459aa4d"}'
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-05-08T04:17:41Z"
finalizers:
- wrangler.cattle.io/managed-etcd-controller
- wrangler.cattle.io/node
labels:
beta.kubernetes.io/arch: arm64
beta.kubernetes.io/instance-type: k3s
beta.kubernetes.io/os: linux
kubernetes.io/arch: arm64
kubernetes.io/hostname: turing-node-1
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/master: "true"
node.kubernetes.io/instance-type: k3s
plan.upgrade.cattle.io/server-plan: 0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c
name: turing-node-1
resourceVersion: "63160361"
uid: df0191c4-ce5d-4755-a5d9-f5a989dfebea
spec:
podCIDR: 10.42.0.0/24
podCIDRs:
- 10.42.0.0/24
providerID: k3s://turing-node-1
status:
addresses:
- address: 192.168.2.253
type: InternalIP
- address: 2600:1700:38c2:8a10::41
type: InternalIP
- address: turing-node-1
type: Hostname
allocatable:
cpu: "4"
ephemeral-storage: "29559886006"
memory: 7999972Ki
pods: "110"
capacity:
cpu: "4"
ephemeral-storage: 30386396Ki
memory: 7999972Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2023-09-26T15:04:05Z"
lastTransitionTime: "2023-05-14T17:32:37Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-09-26T15:04:05Z"
lastTransitionTime: "2023-05-14T17:32:37Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-09-26T15:04:05Z"
lastTransitionTime: "2023-05-14T17:32:37Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-09-26T15:04:05Z"
lastTransitionTime: "2023-09-22T19:10:56Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- ghcr.io/home-assistant/home-assistant@sha256:0c4475289186eeadf1b987a6a3df7bbc6d3b33bed6bcb1dbc8d6aabfdaf798ed
- ghcr.io/home-assistant/home-assistant:2023.3.5
sizeBytes: 453949277
- names:
- ghcr.io/home-assistant/home-assistant@sha256:0a0ae67f5a3121d50890baf1f07baa687468fe448e635e2c34d2b95faf5086b0
- ghcr.io/home-assistant/home-assistant:2023.3.1
sizeBytes: 453770753
- names:
- ghcr.io/home-assistant/home-assistant@sha256:2c631c99d7078072126e50050b57042ec5548b721f089a87e76dfb24c1071a83
- ghcr.io/home-assistant/home-assistant:2023.5.4
sizeBytes: 451027806
- names:
- ghcr.io/home-assistant/home-assistant@sha256:d38bc4d21453d6e3e4b0af2b62cf86211b28479946e4e895d4434b3f82c4e58a
- ghcr.io/home-assistant/home-assistant:2022.12.8
sizeBytes: 446792583
- names:
- docker.io/library/nextcloud@sha256:0ab4b64883b3adf121a3076cd9b8a160a224aa1fa81f75cdb7c4bc4fdeaaa803
- docker.io/library/nextcloud:26.0.1
sizeBytes: 347153037
- names:
- docker.io/library/mariadb@sha256:37e9f7e3cea0096f7fba9d2a77cf0ac926c830e8931d1679da3bcd8fb8989d47
- docker.io/library/mariadb:10.6
sizeBytes: 118893377
- names:
- docker.io/pihole/pihole@sha256:dcd0885a3fe050da005cb544904444cc098017636d6d495ac8770a9aa523a0ef
- docker.io/pihole/pihole:2022.05
sizeBytes: 111326841
- names:
- docker.io/rancher/k3s-upgrade@sha256:6c4543ecde336df20a21f88e5e84399f923bdb3f9bbdc7e815cfdbca643ec50a
- docker.io/rancher/k3s-upgrade:v1.27.6-k3s1
sizeBytes: 53055042
- names:
- docker.io/anonsoftware28/kubernetes-secret-generator@sha256:1d5bfe7b227caf060d0e61488aecdc40e475f8c8640420fbf7ab500333dcfd60
- docker.io/anonsoftware28/kubernetes-secret-generator:latest
sizeBytes: 51735974
- names:
- docker.io/rancher/mirrored-library-traefik@sha256:0842af6afcdf4305d17e862bad4eaf379d0817c987eedabeaff334e2273459c1
- docker.io/rancher/mirrored-library-traefik:2.9.4
sizeBytes: 35650744
- names:
- docker.io/rancher/mirrored-metrics-server@sha256:16185c0d4d01f8919eca4779c69a374c184200cd9e6eded9ba53052fd73578df
- docker.io/rancher/mirrored-metrics-server:v0.6.2
sizeBytes: 26205509
- names:
- docker.io/dopingus/cert-manager-webhook-dynu@sha256:7958523006f78123305597115cb1ba7f7b448e658549ddb6a089582c4bec8628
- docker.io/dopingus/cert-manager-webhook-dynu:latest
sizeBytes: 17882163
- names:
- docker.io/dopingus/cert-manager-webhook-dynu@sha256:7618e6678a9f3210ef0ea530a0f58f5932e80aa673729a7ab223a9b24b804cd2
sizeBytes: 17882150
- names:
- registry.k8s.io/sig-storage/nfs-subdir-external-provisioner@sha256:63d5e04551ec8b5aae83b6f35938ca5ddc50a88d85492d9731810c31591fa4c9
- registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
sizeBytes: 16673053
- names:
- quay.io/jetstack/cert-manager-controller@sha256:cd9bf3d48b6b8402a2a8b11953f9dc0275ba4beec14da47e31823a0515cde7e2
- quay.io/jetstack/cert-manager-controller:v1.9.1
sizeBytes: 15265466
- names:
- docker.io/rancher/mirrored-coredns-coredns@sha256:a11fafae1f8037cbbd66c5afa40ba2423936b72b4fd50a7034a7e8b955163594
- docker.io/rancher/mirrored-coredns-coredns:1.10.1
sizeBytes: 14556850
- names:
- docker.io/rancher/local-path-provisioner@sha256:5bb33992a4ec3034c28b5e0b3c4c2ac35d3613b25b79455eb4b1a95adc82cdc0
- docker.io/rancher/local-path-provisioner:v0.0.24
sizeBytes: 13884168
- names:
- docker.io/rancher/kubectl@sha256:9be095ca0bbc74e8947a1d4a0258875304b590057d858eb9738de000f88a473e
- docker.io/rancher/kubectl:v1.25.4
sizeBytes: 13045642
- names:
- quay.io/jetstack/cert-manager-webhook@sha256:4ab2982a220e1c719473d52d8463508422ab26e92664732bfc4d96b538af6b8a
- quay.io/jetstack/cert-manager-webhook:v1.9.1
sizeBytes: 12244995
- names:
- quay.io/jetstack/cert-manager-cainjector@sha256:df7f0b5186ddb84eccb383ed4b10ec8b8e2a52e0e599ec51f98086af5f4b4938
- quay.io/jetstack/cert-manager-cainjector:v1.9.1
sizeBytes: 10909067
- names:
- docker.io/rancher/system-upgrade-controller@sha256:c730c4ec8dc914b94be13df77d9b58444277330a2bdf39fe667beb5af2b38c0b
- docker.io/rancher/system-upgrade-controller:v0.13.1
sizeBytes: 9617607
- names:
- docker.io/rancher/klipper-lb@sha256:2b963c02974155f7e9a51c54b91f09099e48b4550689aadb595e62118e045c10
- docker.io/rancher/klipper-lb:v0.4.3
sizeBytes: 4163722
- names:
- docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
- docker.io/rancher/mirrored-pause:3.6
sizeBytes: 253243
nodeInfo:
architecture: arm64
bootID: 5bcbbf33-f7d7-4058-a5f4-94a5d26e129c
containerRuntimeVersion: containerd://1.6.19-k3s1
kernelVersion: 5.15.32-v8+
kubeProxyVersion: v1.26.4+k3s1
kubeletVersion: v1.26.4+k3s1
machineID: 75a2a6365a604bc389ab0ab7c51c66c6
operatingSystem: linux
osImage: Debian GNU/Linux 11 (bullseye)
systemUUID: 75a2a6365a604bc389ab0ab7c51c66c6
I tested the selector by using it in kubectl command to filter the node, you can see the command in my original post.
Yes, but your provided node selector is merged with the plan hash label selector; I linked the code where that occurs up above.
In this case your node has a label on it that indicates this plan has already run successfully on this node:
plan.upgrade.cattle.io/server-plan: 0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c
Delete the label and it should run again. You may want to keep an eye on it more closely, it sounds like the upgrade image ran successfully and the jobs were cleaned up, despite having not successfully upgraded the version on the node.
Thanks, we are getting somewhere. After I had removed the label the job immediately executed. The bad news is that the node version is still the same but I could not find any error. Here is where I looked.
$ kubectl get job -n system-upgrade
NAME COMPLETIONS DURATION AGE
apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-0c1d4 1/1 9s 77s
$ kubectl describe job -n system-upgrade apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-0c1d4
Name: apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-0c1d4
Namespace: system-upgrade
Selector: controller-uid=6a477e0d-3281-4af3-9470-bcda9218cd78
Labels: objectset.rio.cattle.io/hash=d661ea5d7278683dce770ce40b105bf148fce4d9
plan.upgrade.cattle.io/server-plan=0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c
upgrade.cattle.io/controller=system-upgrade-controller
upgrade.cattle.io/node=turing-node-1
upgrade.cattle.io/plan=server-plan
upgrade.cattle.io/version=v1.27.6-k3s1
Annotations: batch.kubernetes.io/job-tracking:
objectset.rio.cattle.io/applied:
H4sIAAAAAAAA/+xXUW/iOBD+Kyc/JzSBlBKke+AKe4t2C6h097RaVZVjT8CHY+dsB4oQ//1kJ9CE0m537+UeqqptHNvjzzPfNzPZoQwMpthg1N8hLIQ02DAptB3K5G8gRoNpKSZbBB...
objectset.rio.cattle.io/id: system-upgrade-controller
objectset.rio.cattle.io/owner-gvk: upgrade.cattle.io/v1, Kind=Plan
objectset.rio.cattle.io/owner-name: server-plan
objectset.rio.cattle.io/owner-namespace: system-upgrade
upgrade.cattle.io/ttl-seconds-after-finished: 900
Controlled By: Plan/server-plan
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Tue, 26 Sep 2023 10:36:06 -0700
Completed At: Tue, 26 Sep 2023 10:36:15 -0700
Duration: 9s
Active Deadline Seconds: 900s
Pods Statuses: 0 Active (0 Ready) / 1 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=6a477e0d-3281-4af3-9470-bcda9218cd78
job-name=apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-0c1d4
plan.upgrade.cattle.io/server-plan=0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c
upgrade.cattle.io/controller=system-upgrade-controller
upgrade.cattle.io/node=turing-node-1
upgrade.cattle.io/plan=server-plan
upgrade.cattle.io/version=v1.27.6-k3s1
Service Account: system-upgrade
Init Containers:
cordon:
Image: rancher/kubectl:v1.25.4
Port: <none>
Host Port: <none>
Args:
cordon
turing-node-1
Environment:
SYSTEM_UPGRADE_NODE_NAME: (v1:spec.nodeName)
SYSTEM_UPGRADE_POD_NAME: (v1:metadata.name)
SYSTEM_UPGRADE_POD_UID: (v1:metadata.uid)
SYSTEM_UPGRADE_PLAN_NAME: server-plan
SYSTEM_UPGRADE_PLAN_LATEST_HASH: 0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c
SYSTEM_UPGRADE_PLAN_LATEST_VERSION: v1.27.6-k3s1
Mounts:
/host from host-root (rw)
/run/system-upgrade/pod from pod-info (ro)
Containers:
upgrade:
Image: rancher/k3s-upgrade:v1.27.6-k3s1
Port: <none>
Host Port: <none>
Environment:
SYSTEM_UPGRADE_NODE_NAME: (v1:spec.nodeName)
SYSTEM_UPGRADE_POD_NAME: (v1:metadata.name)
SYSTEM_UPGRADE_POD_UID: (v1:metadata.uid)
SYSTEM_UPGRADE_PLAN_NAME: server-plan
SYSTEM_UPGRADE_PLAN_LATEST_HASH: 0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c
SYSTEM_UPGRADE_PLAN_LATEST_VERSION: v1.27.6-k3s1
Mounts:
/host from host-root (rw)
/run/system-upgrade/pod from pod-info (ro)
Volumes:
host-root:
Type: HostPath (bare host directory volume)
Path: /
HostPathType: Directory
pod-info:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m17s job-controller Created pod: apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-48ttz
Normal Completed 2m8s job-controller Job completed
$ kubectl describe pod apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-48ttz -n system-upgrade
Name: apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-48ttz
Namespace: system-upgrade
Priority: 0
Service Account: system-upgrade
Node: turing-node-1/192.168.2.253
Start Time: Tue, 26 Sep 2023 10:36:06 -0700
Labels: controller-uid=6a477e0d-3281-4af3-9470-bcda9218cd78
job-name=apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-0c1d4
plan.upgrade.cattle.io/server-plan=0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c
upgrade.cattle.io/controller=system-upgrade-controller
upgrade.cattle.io/node=turing-node-1
upgrade.cattle.io/plan=server-plan
upgrade.cattle.io/version=v1.27.6-k3s1
Annotations: <none>
Status: Succeeded
IP: 192.168.2.253
IPs:
IP: 192.168.2.253
IP: 2600:1700:38c2:8a10::41
Controlled By: Job/apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-0c1d4
Init Containers:
cordon:
Container ID: containerd://086924d34d110d054a173ba7cd23c1a4b59f31bef24fec746f7ded4e4b525c4b
Image: rancher/kubectl:v1.25.4
Image ID: docker.io/rancher/kubectl@sha256:9be095ca0bbc74e8947a1d4a0258875304b590057d858eb9738de000f88a473e
Port: <none>
Host Port: <none>
Args:
cordon
turing-node-1
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 26 Sep 2023 10:36:08 -0700
Finished: Tue, 26 Sep 2023 10:36:08 -0700
Ready: True
Restart Count: 0
Environment:
SYSTEM_UPGRADE_NODE_NAME: (v1:spec.nodeName)
SYSTEM_UPGRADE_POD_NAME: apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-48ttz (v1:metadata.name)
SYSTEM_UPGRADE_POD_UID: (v1:metadata.uid)
SYSTEM_UPGRADE_PLAN_NAME: server-plan
SYSTEM_UPGRADE_PLAN_LATEST_HASH: 0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c
SYSTEM_UPGRADE_PLAN_LATEST_VERSION: v1.27.6-k3s1
Mounts:
/host from host-root (rw)
/run/system-upgrade/pod from pod-info (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cr9rc (ro)
Containers:
upgrade:
Container ID: containerd://7ed552599987187c5b0eee160cabcfd11f1faac15e4a32a5cfbca8711f5ccb7f
Image: rancher/k3s-upgrade:v1.27.6-k3s1
Image ID: docker.io/rancher/k3s-upgrade@sha256:6c4543ecde336df20a21f88e5e84399f923bdb3f9bbdc7e815cfdbca643ec50a
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 26 Sep 2023 10:36:10 -0700
Finished: Tue, 26 Sep 2023 10:36:12 -0700
Ready: False
Restart Count: 0
Environment:
SYSTEM_UPGRADE_NODE_NAME: (v1:spec.nodeName)
SYSTEM_UPGRADE_POD_NAME: apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-48ttz (v1:metadata.name)
SYSTEM_UPGRADE_POD_UID: (v1:metadata.uid)
SYSTEM_UPGRADE_PLAN_NAME: server-plan
SYSTEM_UPGRADE_PLAN_LATEST_HASH: 0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c
SYSTEM_UPGRADE_PLAN_LATEST_VERSION: v1.27.6-k3s1
Mounts:
/host from host-root (rw)
/run/system-upgrade/pod from pod-info (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cr9rc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
host-root:
Type: HostPath (bare host directory volume)
Path: /
HostPathType: Directory
pod-info:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
kube-api-access-cr9rc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m27s default-scheduler Successfully assigned system-upgrade/apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-48ttz to turing-node-1
Normal Pulling 5m27s kubelet Pulling image "rancher/kubectl:v1.25.4"
Normal Pulled 5m27s kubelet Successfully pulled image "rancher/kubectl:v1.25.4" in 749.280135ms (749.302747ms including waiting)
Normal Created 5m27s kubelet Created container cordon
Normal Started 5m26s kubelet Started container cordon
Normal Pulling 5m25s kubelet Pulling image "rancher/k3s-upgrade:v1.27.6-k3s1"
Normal Pulled 5m24s kubelet Successfully pulled image "rancher/k3s-upgrade:v1.27.6-k3s1" in 657.589334ms (657.610557ms including waiting)
Normal Created 5m24s kubelet Created container upgrade
Normal Started 5m24s kubelet Started container upgrade
$ kubectl logs apply-server-plan-on-turing-node-1-with-0e4e3f4e3f8b1e811-48ttz -n system-upgrade
Defaulted container "upgrade" out of: upgrade, cordon (init)
+ upgrade
+ get_k3s_process_info
+ ps -ef
+ grep -E -v '(init|grep|channelserver|supervise-daemon)'
+ grep -E '( |/)k3s .*(server|agent)'
+ awk '{print $2}'
+ K3S_PID=18680
+ echo 18680
+ wc -w
+ '[' 1 '!=' 1 ]
+ '[' -z 18680 ]
+ echo 18680
+ wc -w
+ '[' 1 '!=' 1 ]
+ ps -p 18680 -o 'ppid='
+ awk '{print $1}'
+ K3S_PPID=1
+ info 'K3S binary is running with pid 18680, parent pid 1'
+ echo '[INFO] ' 'K3S binary is running with pid 18680, parent pid 1'
+ '[' 1 '!=' 1 ]
+ '[' 18680 '=' 1 ]
[INFO] K3S binary is running with pid 18680, parent pid 1
+ awk 'NR==1 {print $1}' /host/proc/18680/cmdline
+ K3S_BIN_PATH=/usr/local/bin/k3s
+ '[' -z /usr/local/bin/k3s ]
+ '[' '!' -e /host/usr/local/bin/k3s ]
+ return
+ replace_binary
+ NEW_BINARY=/opt/k3s
+ FULL_BIN_PATH=/host/usr/local/bin/k3s
+ '[' '!' -f /opt/k3s ]
[INFO] Comparing old and new binaries
+ info 'Comparing old and new binaries'
+ echo '[INFO] ' 'Comparing old and new binaries'
+ sha256sum /opt/k3s /host/usr/local/bin/k3s
+ BIN_CHECKSUMS='04be543be1c9fbdda30722c5d169099a6972459ea1b1e5df701c42ef54a11f44 /opt/k3s
04be543be1c9fbdda30722c5d169099a6972459ea1b1e5df701c42ef54a11f44 /host/usr/local/bin/k3s'
+ '[' 0 '!=' 0 ]
+ echo '04be543be1c9fbdda30722c5d169099a6972459ea1b1e5df701c42ef54a11f44 /opt/k3s
04be543be1c9fbdda30722c5d169099a6972459ea1b1e5df701c42ef54a11f44 /host/usr/local/bin/k3s'
+ awk '{print $1}'
+ uniq
+ wc -l
+ BIN_COUNT=1
+ '[' 1 '=' 1 ]
+ info 'Binary already been replaced'
+ echo '[INFO] ' 'Binary already been replaced'
+ exit 0
[INFO] Binary already been replaced
$ kubectl get node turing-node-1 -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
etcd.k3s.cattle.io/node-address: 192.168.2.253
etcd.k3s.cattle.io/node-name: turing-node-1-15e70d0d
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"f6:51:16:0b:a0:13"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 192.168.2.253
k3s.io/hostname: turing-node-1
k3s.io/internal-ip: 192.168.2.253,2600:1700:38c2:8a10::41
k3s.io/node-args: '["server","--cluster-init"]'
k3s.io/node-config-hash: XFJS3VE5KBQGMQZO4QOV2XN233KRIPYZWEZD3YOPYBRFV6NHLY3A====
k3s.io/node-env: '{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/4b147cafa965066cd68e04b4e3acce221078156a3b9ba635a653517ce459aa4d"}'
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-05-08T04:17:41Z"
finalizers:
- wrangler.cattle.io/managed-etcd-controller
- wrangler.cattle.io/node
labels:
beta.kubernetes.io/arch: arm64
beta.kubernetes.io/instance-type: k3s
beta.kubernetes.io/os: linux
kubernetes.io/arch: arm64
kubernetes.io/hostname: turing-node-1
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/master: "true"
node.kubernetes.io/instance-type: k3s
plan.upgrade.cattle.io/server-plan: 0e4e3f4e3f8b1e811d841099cb49e4712b93833bee0604115b9a141c
name: turing-node-1
resourceVersion: "63216486"
uid: df0191c4-ce5d-4755-a5d9-f5a989dfebea
spec:
podCIDR: 10.42.0.0/24
podCIDRs:
- 10.42.0.0/24
providerID: k3s://turing-node-1
status:
addresses:
- address: 192.168.2.253
type: InternalIP
- address: 2600:1700:38c2:8a10::41
type: InternalIP
- address: turing-node-1
type: Hostname
allocatable:
cpu: "4"
ephemeral-storage: "29559886006"
memory: 7999972Ki
pods: "110"
capacity:
cpu: "4"
ephemeral-storage: 30386396Ki
memory: 7999972Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2023-09-26T17:52:31Z"
lastTransitionTime: "2023-05-14T17:32:37Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-09-26T17:52:31Z"
lastTransitionTime: "2023-05-14T17:32:37Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-09-26T17:52:31Z"
lastTransitionTime: "2023-05-14T17:32:37Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-09-26T17:52:31Z"
lastTransitionTime: "2023-09-22T19:10:56Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- ghcr.io/home-assistant/home-assistant@sha256:0c4475289186eeadf1b987a6a3df7bbc6d3b33bed6bcb1dbc8d6aabfdaf798ed
- ghcr.io/home-assistant/home-assistant:2023.3.5
sizeBytes: 453949277
- names:
- ghcr.io/home-assistant/home-assistant@sha256:0a0ae67f5a3121d50890baf1f07baa687468fe448e635e2c34d2b95faf5086b0
- ghcr.io/home-assistant/home-assistant:2023.3.1
sizeBytes: 453770753
- names:
- ghcr.io/home-assistant/home-assistant@sha256:2c631c99d7078072126e50050b57042ec5548b721f089a87e76dfb24c1071a83
- ghcr.io/home-assistant/home-assistant:2023.5.4
sizeBytes: 451027806
- names:
- ghcr.io/home-assistant/home-assistant@sha256:d38bc4d21453d6e3e4b0af2b62cf86211b28479946e4e895d4434b3f82c4e58a
- ghcr.io/home-assistant/home-assistant:2022.12.8
sizeBytes: 446792583
- names:
- docker.io/library/nextcloud@sha256:0ab4b64883b3adf121a3076cd9b8a160a224aa1fa81f75cdb7c4bc4fdeaaa803
- docker.io/library/nextcloud:26.0.1
sizeBytes: 347153037
- names:
- docker.io/library/mariadb@sha256:37e9f7e3cea0096f7fba9d2a77cf0ac926c830e8931d1679da3bcd8fb8989d47
- docker.io/library/mariadb:10.6
sizeBytes: 118893377
- names:
- docker.io/pihole/pihole@sha256:dcd0885a3fe050da005cb544904444cc098017636d6d495ac8770a9aa523a0ef
- docker.io/pihole/pihole:2022.05
sizeBytes: 111326841
- names:
- docker.io/rancher/k3s-upgrade@sha256:6c4543ecde336df20a21f88e5e84399f923bdb3f9bbdc7e815cfdbca643ec50a
- docker.io/rancher/k3s-upgrade:v1.27.6-k3s1
sizeBytes: 53055042
- names:
- docker.io/anonsoftware28/kubernetes-secret-generator@sha256:1d5bfe7b227caf060d0e61488aecdc40e475f8c8640420fbf7ab500333dcfd60
- docker.io/anonsoftware28/kubernetes-secret-generator:latest
sizeBytes: 51735974
- names:
- docker.io/rancher/mirrored-library-traefik@sha256:0842af6afcdf4305d17e862bad4eaf379d0817c987eedabeaff334e2273459c1
- docker.io/rancher/mirrored-library-traefik:2.9.4
sizeBytes: 35650744
- names:
- docker.io/rancher/mirrored-metrics-server@sha256:16185c0d4d01f8919eca4779c69a374c184200cd9e6eded9ba53052fd73578df
- docker.io/rancher/mirrored-metrics-server:v0.6.2
sizeBytes: 26205509
- names:
- docker.io/dopingus/cert-manager-webhook-dynu@sha256:7958523006f78123305597115cb1ba7f7b448e658549ddb6a089582c4bec8628
- docker.io/dopingus/cert-manager-webhook-dynu:latest
sizeBytes: 17882163
- names:
- docker.io/dopingus/cert-manager-webhook-dynu@sha256:7618e6678a9f3210ef0ea530a0f58f5932e80aa673729a7ab223a9b24b804cd2
sizeBytes: 17882150
- names:
- registry.k8s.io/sig-storage/nfs-subdir-external-provisioner@sha256:63d5e04551ec8b5aae83b6f35938ca5ddc50a88d85492d9731810c31591fa4c9
- registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
sizeBytes: 16673053
- names:
- quay.io/jetstack/cert-manager-controller@sha256:cd9bf3d48b6b8402a2a8b11953f9dc0275ba4beec14da47e31823a0515cde7e2
- quay.io/jetstack/cert-manager-controller:v1.9.1
sizeBytes: 15265466
- names:
- docker.io/rancher/mirrored-coredns-coredns@sha256:a11fafae1f8037cbbd66c5afa40ba2423936b72b4fd50a7034a7e8b955163594
- docker.io/rancher/mirrored-coredns-coredns:1.10.1
sizeBytes: 14556850
- names:
- docker.io/rancher/local-path-provisioner@sha256:5bb33992a4ec3034c28b5e0b3c4c2ac35d3613b25b79455eb4b1a95adc82cdc0
- docker.io/rancher/local-path-provisioner:v0.0.24
sizeBytes: 13884168
- names:
- docker.io/rancher/kubectl@sha256:9be095ca0bbc74e8947a1d4a0258875304b590057d858eb9738de000f88a473e
- docker.io/rancher/kubectl:v1.25.4
sizeBytes: 13045642
- names:
- quay.io/jetstack/cert-manager-webhook@sha256:4ab2982a220e1c719473d52d8463508422ab26e92664732bfc4d96b538af6b8a
- quay.io/jetstack/cert-manager-webhook:v1.9.1
sizeBytes: 12244995
- names:
- quay.io/jetstack/cert-manager-cainjector@sha256:df7f0b5186ddb84eccb383ed4b10ec8b8e2a52e0e599ec51f98086af5f4b4938
- quay.io/jetstack/cert-manager-cainjector:v1.9.1
sizeBytes: 10909067
- names:
- docker.io/rancher/system-upgrade-controller@sha256:c730c4ec8dc914b94be13df77d9b58444277330a2bdf39fe667beb5af2b38c0b
- docker.io/rancher/system-upgrade-controller:v0.13.1
sizeBytes: 9617607
- names:
- docker.io/rancher/klipper-lb@sha256:2b963c02974155f7e9a51c54b91f09099e48b4550689aadb595e62118e045c10
- docker.io/rancher/klipper-lb:v0.4.3
sizeBytes: 4163722
- names:
- docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
- docker.io/rancher/mirrored-pause:3.6
sizeBytes: 253243
nodeInfo:
architecture: arm64
bootID: 5bcbbf33-f7d7-4058-a5f4-94a5d26e129c
containerRuntimeVersion: containerd://1.6.19-k3s1
kernelVersion: 5.15.32-v8+
kubeProxyVersion: v1.26.4+k3s1
kubeletVersion: v1.26.4+k3s1
machineID: 75a2a6365a604bc389ab0ab7c51c66c6
operatingSystem: linux
osImage: Debian GNU/Linux 11 (bullseye)
systemUUID: 75a2a6365a604bc389ab0ab7c51c66c6
+ sha256sum /opt/k3s /host/usr/local/bin/k3s
+ BIN_CHECKSUMS='04be543be1c9fbdda30722c5d169099a6972459ea1b1e5df701c42ef54a11f44 /opt/k3s
04be543be1c9fbdda30722c5d169099a6972459ea1b1e5df701c42ef54a11f44 /host/usr/local/bin/k3s'
The binary has already been replaced - the checksums match. The upgrade image just checks to see that the binaries have been replaced; it doesn't actually look at what version is currently running.
I suspect that it ran into some sort of problem killing the k3s process to trigger a restart of the service into the new version? Without logs from the original successful upgrade, it's impossible to say why. You might target a different version or channel with your plan, and check the upgrade pod logs afterwards.
The upgrade pods hang around for quite a while after they run, even when successful. How long did you wait after applying the plan, before you went looking to see if it'd actually upgraded or not?
It must have been at least an hour, but I do not remember exactly. I actually had a plan for the worker nodes too just like they advised in the link and noticed that the job for it was stuck. I assumed it was waiting for the control node upgrade to finish which led me to clean up everything and start with the master node upgrade only and then post this question.
Anyway, after your explanation I have rebooted the cluster and the master node shows the correct version. I shall now retry the agent upgrade plan.
Version
Platform/Architecture
Describe the bug
I followed the instructions at https://docs.k3s.io/upgrades/automated to set up the automated upgrades. For the initial configuration I specified only the control plane node upgrade plan. However no action appear to have been taken.
To Reproduce
The content of control.yml can be seen below. The controller and the plan have been loaded at this point, but nothing happened which you can see from the node versions.
Expected behavior
Based on the plan and the current version, I expected to see a job that would upgrade the control plane node to version 1.27. Or, if there is something wrong with my environment (for example, it has occurred to me that I might need more than one control plane node to stagger an upgrade), I would expect to see an appropriate message in the log, but there is nothing of interest in it:
Actual behavior
Nothing happened.
Additional context
There had been a similar bug report I found https://github.com/rancher/system-upgrade-controller/issues/90, but in that case the node selector in the plan was wrong. Here is what the plan looks in this case:
And the selector verification follows: