kubean-io / kubean

:seedling: Product ready cluster lifecycle management toolchains based on kubespray and other cluster LCM engine.
https://kubean-io.github.io/kubean/
Apache License 2.0
456 stars 34 forks source link

cluster reset operation fail include pod and job but ansible playbook execute success #22

Closed tukwila closed 2 years ago

tukwila commented 2 years ago

Describe the version version about:

  1. kubean: v0.0.0
  2. kubespray: master
  3. kubenetes: not related
  4. what CNI and itsversion: not related

Describe the bug Do reset operation, check pod or job log, ansible playbook execute ok: PLAY RECAP ***** localhost : ok=3 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 node1 : ok=118 changed=16 unreachable=0 failed=0 skipped=98 rescued=0 ignored=0

Friday 22 July 2022 05:58:28 +0000 (0:00:04.684) 0:03:25.978 ***

reset : reset | delete some files and directories ---------------------- 68.33s /kubespray/roles/reset/tasks/main.yml:242 ------------------------------------- reset : reset | remove containerd binary files -------------------------- 9.71s /kubespray/roles/reset/tasks/main.yml:324 ------------------------------------- reset : reset | remove services ----------------------------------------- 7.60s /kubespray/roles/reset/tasks/main.yml:14 -------------------------------------- kubernetes/preinstall : Install packages requirements ------------------- 6.10s /kubespray/roles/kubernetes/preinstall/tasks/0070-system-packages.yml:58 ------ network_plugin/calico : Get current calico cluster version -------------- 5.96s /kubespray/roles/network_plugin/calico/tasks/check.yml:37 --------------------- kubernetes/preinstall : Ensure kube-bench parameters are set ------------ 5.71s /kubespray/roles/kubernetes/preinstall/tasks/0080-system-configurations.yml:108 kubernetes/preinstall : Create kubernetes directories ------------------- 4.93s /kubespray/roles/kubernetes/preinstall/tasks/0050-create_directories.yml:2 ---- reset : reset | Restart network ----------------------------------------- 4.68s /kubespray/roles/reset/tasks/main.yml:375 ------------------------------------- reset : flush iptables -------------------------------------------------- 4.25s /kubespray/roles/reset/tasks/main.yml:195 ------------------------------------- reset : reset | stop services ------------------------------------------- 3.67s /kubespray/roles/reset/tasks/main.yml:2 --------------------------------------- kubernetes/preinstall : Create cni directories -------------------------- 3.00s /kubespray/roles/kubernetes/preinstall/tasks/0050-create_directories.yml:70 --- reset : reset | stop etcd services -------------------------------------- 2.44s /kubespray/roles/reset/tasks/main.yml:132 ------------------------------------- kubernetes/preinstall : Remove swapfile from /etc/fstab ----------------- 2.25s /kubespray/roles/kubernetes/preinstall/tasks/0010-swapoff.yml:2 --------------- kubernetes/preinstall : NetworkManager | Prevent NetworkManager from managing Calico interfaces (cali/tunl/vxlan.calico) --- 2.10s /kubespray/roles/kubernetes/preinstall/tasks/0062-networkmanager-unmanaged-devices.yml:8 kubernetes/preinstall : Clean previously used sysctl file locations ----- 2.06s /kubespray/roles/kubernetes/preinstall/tasks/0080-system-configurations.yml:40 kubernetes/preinstall : Hosts | Update (if necessary) hosts file -------- 1.96s /kubespray/roles/kubernetes/preinstall/tasks/0090-etchosts.yml:63 ------------- Gather necessary facts (hardware) --------------------------------------- 1.96s /kubespray/facts.yml:24 ------------------------------------------------------- reset : reset | remove etcd services ------------------------------------ 1.87s /kubespray/roles/reset/tasks/main.yml:143 ------------------------------------- reset : reset | remove dns settings from dhclient.conf ------------------ 1.86s /kubespray/roles/reset/tasks/main.yml:344 ------------------------------------- kubernetes/preinstall : Preinstall | reload NetworkManager -------------- 1.84s /kubespray/roles/kubernetes/preinstall/handlers/main.yml:32 -------------------

[actions-runner@debug kubean_functions_e2e]$ kubectl get pod/kubean-e2e-cluster2-reset-2-job-6pqfc -n kubean-system --kubeconfig=$MAIN_KUBECONFIG NAME READY STATUS RESTARTS AGE kubean-e2e-cluster2-reset-2-job-6pqfc 0/1 Error 0 46m

[actions-runner@debug kubean_functions_e2e]$ kubectl get job.batch/kubean-e2e-cluster2-reset-2-job -n kubean-system --kubeconfig=$MAIN_KUBECONFIG -o yaml status: conditions:

How To Reproduce Steps to reproduce the issue: do reset

Expected behavior A clear and concise description of what you expected to happen. reset operation can pass e2e-install-cluster.zip

Screenshots and log If applicable, add screenshots and log to help explain your problem.

Additional context Add any other context about the problem here.

ErikJiang commented 2 years ago

image image

未能复现

从上述错误日志看:

/bin/sh: kubectl: command not found 
non-zero return code

怀疑是 kubeanClusterOps spec.image 镜像有关,正常镜像里会包含kubectl, 但报错提示 kubectl 不存在,显然 spray 镜像有问题;