pacoxu / kubeadm-operator

Test work on the design of kubeadm operator. Also you can try https://github.com/chendave/kubeadm-operator
Apache License 2.0
10 stars 2 forks source link

Failed operation is re-applied to cluster after another operation failed #64

Closed pacoxu closed 2 years ago

pacoxu commented 2 years ago
pacoxu commented 2 years ago

At first, the upgrade to v1.24.1 failed.

[root@paco ~]# cat upgrade-v1.24.1.yaml
apiVersion: operator.kubeadm.x-k8s.io/v1alpha1
kind: Operation
metadata:
  creationTimestamp: "2022-06-01T14:36:11Z"
  generation: 1
  labels:
    operator.kubeadm.x-k8s.io/operation: upgrade-v1.24.1
    operator.kubeadm.x-k8s.io/uid: 379509e0-e1b8-11ec-9e60-3ad99ac2b969
  name: upgrade-v1.24.1
  resourceVersion: "7989559"
  uid: 621d4fea-6445-43dc-ada4-573bb1325a1f
spec:
  executionMode: Auto
  upgrade:
    kubernetesVersion: v1.24.1
    local: false

Then I apply another operation to v1.24.0

apiVersion: operator.kubeadm.x-k8s.io/v1alpha1
kind: Operation
metadata:
  creationTimestamp: "2022-06-02T08:02:57Z"
  generation: 1
  labels:
    operator.kubeadm.x-k8s.io/operation: upgrade-v1.24.0
    operator.kubeadm.x-k8s.io/uid: 6a20bd87-e24a-11ec-9e68-be2291df9fb8
  name: upgrade-v1.24.0
  resourceVersion: "8167811"
  uid: 7b0d137e-7063-4384-9b32-b267fab7cb64
spec:
  executionMode: Auto
  upgrade:
    kubernetesVersion: v1.24.0
    local: false

The starting status:

[root@paco ~]# ./kubectl get operations,runtimetask,runtimetaskgroup
NAME                                                            PHASE       GROUPS   SUCCEEDED   FAILED
operation.operator.kubeadm.x-k8s.io/upgrade-v1.24.0             Running     2
operation.operator.kubeadm.x-k8s.io/upgrade-v1.24.1             Failed      2

NAME                                                                                    PHASE       STARTTIME   COMMAND   COMPLETIONTIME
runtimetask.operator.kubeadm.x-k8s.io/upgrade-v1.24.0-01-upgrade-apply-paco             Running     4m4s        3/3
runtimetask.operator.kubeadm.x-k8s.io/upgrade-v1.24.1-01-upgrade-apply-paco             Succeeded   25m         3/3       15m

NAME                                                                                PHASE       NODES   SUCCEEDED   FAILED
runtimetaskgroup.operator.kubeadm.x-k8s.io/upgrade-v1.24.0-01-upgrade-apply         Running     1
runtimetaskgroup.operator.kubeadm.x-k8s.io/upgrade-v1.24.1-01-upgrade-apply         Succeeded   1       1
pacoxu commented 2 years ago

The v1.24.1 re-applied

root@paco ~]# ./kubectl get operations,runtimetask,runtimetaskgroup
NAME                                                            PHASE       GROUPS   SUCCEEDED   FAILED
operation.operator.kubeadm.x-k8s.io/upgrade-v1.24.0             Failed      2
operation.operator.kubeadm.x-k8s.io/upgrade-v1.24.1             Running     2        1

NAME                                                                                    PHASE       STARTTIME   COMMAND   COMPLETIONTIME
runtimetask.operator.kubeadm.x-k8s.io/upgrade-v1.24.0-01-upgrade-apply-paco             Succeeded   7m16s       3/3       2m46s
runtimetask.operator.kubeadm.x-k8s.io/upgrade-v1.24.1-01-upgrade-apply-paco             Succeeded   29m         3/3       18m
runtimetask.operator.kubeadm.x-k8s.io/upgrade-v1.24.1-04-upgrade-w-daocloud             Running     2m34s       3/5

NAME                                                                                PHASE       NODES   SUCCEEDED   FAILED
runtimetaskgroup.operator.kubeadm.x-k8s.io/upgrade-v1.24.0-01-upgrade-apply         Succeeded   1       1
runtimetaskgroup.operator.kubeadm.x-k8s.io/upgrade-v1.24.1-01-upgrade-apply         Succeeded   1       1
runtimetaskgroup.operator.kubeadm.x-k8s.io/upgrade-v1.24.1-04-upgrade-w             Running     1
pacoxu commented 2 years ago

Finally, the server is v1.24.0, and the work is v1.24.1.

[root@paco ~]# ./kubectl get operations,runtimetask,runtimetaskgroup
NAME                                                            PHASE       GROUPS   SUCCEEDED   FAILED
operation.operator.kubeadm.x-k8s.io/upgrade-v1.24.0             Failed      2
operation.operator.kubeadm.x-k8s.io/upgrade-v1.24.1             Succeeded   2        2

NAME                                                                                    PHASE       STARTTIME   COMMAND   COMPLETIONTIME
runtimetask.operator.kubeadm.x-k8s.io/upgrade-v1.24.0-01-upgrade-apply-paco             Succeeded   9m26s       3/3       4m56s
runtimetask.operator.kubeadm.x-k8s.io/upgrade-v1.24.1-01-upgrade-apply-paco             Succeeded   31m         3/3       20m
runtimetask.operator.kubeadm.x-k8s.io/upgrade-v1.24.1-04-upgrade-w-daocloud             Succeeded   4m44s       5/5       92s

NAME                                                                                PHASE       NODES   SUCCEEDED   FAILED
runtimetaskgroup.operator.kubeadm.x-k8s.io/upgrade-v1.24.0-01-upgrade-apply         Succeeded   1       1
runtimetaskgroup.operator.kubeadm.x-k8s.io/upgrade-v1.24.1-01-upgrade-apply         Succeeded   1       1
runtimetaskgroup.operator.kubeadm.x-k8s.io/upgrade-v1.24.1-04-upgrade-w             Succeeded   1       1

alias version='kubelet --version;kubeadm version; kubectl version; kubectl get node '

[root@paco ~]# version
Kubernetes v1.24.0
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:44:24Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:46:05Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:38:19Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
NAME       STATUS   ROLES           AGE   VERSION
daocloud   Ready    <none>          10d   v1.24.1
paco       Ready    control-plane   43d   v1.24.0
pacoxu commented 2 years ago

The bug was introduced when I add checkings before creating runtime task groups for operations.