siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.59k stars 524 forks source link

`talosctl upgrade --image some:image` does not re-pull the image #5750

Closed utkuozdemir closed 2 months ago

utkuozdemir commented 2 years ago

Bug Report

Description

We can introduce a flag to the upgrade command like --force-pull to enforce pulling of image.

Logs

172.20.0.2: [talos] upgrade request received: preserve true, staged false, force false
172.20.0.2: [talos] validating "ghcr.io/utkuozdemir/talos-installer:test-break"
172.20.0.2: machined Unknown [/machine.MachineService/Upgrade] 2.473929476s unary error validating installer image "ghcr.io/utkuozdemir/talos-installer:test-break": failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/bin/installer": stat /bin/installer: no such file or directory: unknown (:authority=localhost;content-type=application/grpc;proxyfrom=172.20.0.2,172.20.0.3,172.20.0.4;talos-role=os:admin;user-agent=grpc-go/1.47.0)
....
....
....
172.20.0.2: [talos] upgrade request received: preserve true, staged false, force false
172.20.0.2: [talos] validating "ghcr.io/utkuozdemir/talos-installer:test-break"
172.20.0.2: machined Unknown [/machine.MachineService/Upgrade] 63.348966ms unary error validating installer image "ghcr.io/utkuozdemir/talos-installer:test-break": failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/bin/installer": stat /bin/installer: no such file or directory: unknown (:authority=localhost;content-type=application/grpc;proxyfrom=172.20.0.2,172.20.0.3,172.20.0.4;talos-role=os:admin;user-agent=grpc-go/1.47.0)
smira commented 2 years ago

The root cause is that image is pulled and cached in the system containerd in memory (in tmpfs).

So rebooting a node is enough as a workaround.

The proper fix is to pull the image always while processing the upgrade API request, but use the cached image when running the actual upgrade.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 months ago

This issue was closed because it has been stalled for 7 days with no activity.