rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 396 forks source link

Can't get latest version deployed by system-upgrade-controller #821

Closed kooskaspers closed 2 years ago

kooskaspers commented 2 years ago

Version (k3OS / kernel) k3os version v0.19.15-k3s2r0

Architecture x86_64

Describe the bug I'm trying to upgrade from v0.19.15-k3s2r0 to "latest" since I have te feeling I'm running a bit behind with v0.19. As mentioned in the readme, I need to make sure my node has the label k3os.io/upgrade with value latest. It does:

$ kubectl describe node | grep upgrade
k3os.io/upgrade=enabled
plan.upgrade.cattle.io/k3os-latest=9e8285352a97daf8b2f89bea317d06378573fcae3ef678a01401b70b

The plan seems fine (to me) as well: kubectl describe plan -n k3os-system

Name:         k3os-latest
Namespace:    k3os-system
Labels:       <none>
Annotations:  <none>
API Version:  upgrade.cattle.io/v1
Kind:         Plan
Metadata:
  Creation Timestamp:  2021-10-09T11:28:10Z
  Generation:          1
    Manager:         system-upgrade-controller
    Operation:       Update
    Time:            2021-10-09T11:28:17Z
  Resource Version:  113935261
  Self Link:         /apis/upgrade.cattle.io/v1/namespaces/k3os-system/plans/k3os-latest
  UID:               cb0a8c46-632f-4db5-ad4a-b5fa557f004f
Spec:
  Channel:      https://github.com/rancher/k3os/releases/latest
  Concurrency:  1
  Cordon:       false
  Drain:
    Disable Eviction:              false
    Force:                         true
    Skip Wait For Delete Timeout:  0
  Node Selector:
    Match Expressions:
      Key:       k3os.io/upgrade
      Operator:  In
      Values:
        latest
      Key:       k3os.io/mode
      Operator:  Exists
      Key:       k3os.io/mode
      Operator:  NotIn
      Values:
        live
  Prepare:
    Args:
    Command:
      k3os
      --version
    Image:               rancher/k3os
  Service Account Name:  k3os-upgrade
  Tolerations:
    Key:       CriticalAddonsOnly
    Operator:  Exists
    Effect:    NoSchedule
    Key:       node-role.kubernetes.io/master
    Operator:  Exists
    Effect:    NoSchedule
    Key:       kubernetes.io/arch
    Operator:  Equal
    Value:     amd64
    Effect:    NoSchedule
    Key:       kubernetes.io/arch
    Operator:  Equal
    Value:     arm64
    Effect:    NoSchedule
    Key:       kubernetes.io/arch
    Operator:  Equal
    Value:     arm
  Upgrade:
    Args:
      upgrade
      --kernel
      --rootfs
      --remount
      --sync
      --reboot
      --lock-file=/host/run/k3os/upgrade.lock
      --source=/k3os/system
      --destination=/host/k3os/system
    Command:
      k3os
      --debug
    Image:  rancher/k3os
Status:
  Conditions:
    Last Update Time:  2021-11-26T10:32:27Z
    Reason:            Channel
    Status:            True
    Type:              LatestResolved
  Latest Hash:         ce4bf863b919f2c2fb925ad3dfca3c818d4a9f6275f58625d7a2725c
  Latest Version:      v0.21.5-k3s2r1
Events:                <none>

In the 'Status' section, it determines the Latest Version being 'v0.21.5-k3s2r1'. Looks good, if that version would be installed by the system-upgrade-controller. But it seems that's not really happening at all.

I had a look if the lock file is present:

cat /host/run/k3os/upgrade.lock
cat: /host/run/k3os/upgrade.lock: No such file or directory

Doesn't seem the problem.

Labels, Tains don't seem the problem neither: kubectl describe node

Name:               kubernetes
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=k3s
                    beta.kubernetes.io/os=linux
                    k3os.io/mode=local
                    k3os.io/upgrade=enabled
                    k3os.io/version=v0.19.15-k3s2r0
                    k3s.io/hostname=kubernetes
                    k3s.io/internal-ip=192.168.1.51
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=kubernetes
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node-role.kubernetes.io/master=true
                    node.kubernetes.io/instance-type=k3s
                    plan.upgrade.cattle.io/k3os-latest=9e8285352a97daf8b2f89bea317d06378573fcae3ef678a01401b70b
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"be:04:53:34:41:53"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.1.51
                    k3s.io/node-args: ["server","--no-deploy","traefik","--node-label","k3os.io/mode=local","--node-label","k3os.io/version=v0.19.15-k3s2r0"]
                    k3s.io/node-config-hash: 4LHW7ACOHKXJ2IHYHIE344TQETEAY5HZYHTPAD7444LOKF5HHDNQ====
                    k3s.io/node-env:
                      {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/cba07c8500bccabd42d9215a6af6b01181cb6ca5755d12ae1e4e02b27b50bafa","K3S_KUBECONFIG_MODE":"0644"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 08 Dec 2019 10:22:07 +0100
Taints:             <none>
Unschedulable:      false

What should be the next thing to have a look at?

To Reproduce

Expected behavior Version v0.21.5-k3s2r1 being installed by system-upgrade-controller.

Actual behavior I'm still stuck on k3os version v0.19.15-k3s2r0.

Additional context

system-upgrade-controller deployment:

Name:                   system-upgrade-controller
Namespace:              k3os-system
CreationTimestamp:      Sat, 09 Oct 2021 13:28:16 +0200
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               upgrade.cattle.io/controller=system-upgrade-controller
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           upgrade.cattle.io/controller=system-upgrade-controller
  Service Account:  k3os-upgrade
  Containers:
   system-upgrade-controller:
    Image:      rancher/system-upgrade-controller:v0.7.7
    Port:       <none>
    Host Port:  <none>
    Environment Variables from:
      default-controller-env  ConfigMap  Optional: false
    Environment:
      SYSTEM_UPGRADE_CONTROLLER_NAME:        (v1:metadata.labels['upgrade.cattle.io/controller'])
      SYSTEM_UPGRADE_CONTROLLER_NAMESPACE:   (v1:metadata.namespace)
    Mounts:
      /etc/ssl from etc-ssl (rw)
      /tmp from tmp (rw)
  Volumes:
   etc-ssl:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl
    HostPathType:  Directory
   tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   system-upgrade-controller-5b574bf4d6 (1/1 replicas created)
Events:          <none>
yngveh commented 2 years ago

Try change node label k3os.io/upgrade to "latest" instead of "enabled".

Documentation is reacently changed in the following commit https://github.com/rancher/k3os/commit/a9c866997db4f7fcf004e0dedf0aa5cc0dd37d80

kooskaspers commented 2 years ago

Great mate. That's the fix!