hystax / optscale

FinOps and cloud cost optimization tool. Supports AWS, Azure, GCP, Alibaba Cloud and Kubernetes.
https://hystax.com
Apache License 2.0
1.11k stars 156 forks source link

TASK [k8s-init : Initialize kubernetes master] FAILED! #328

Closed binary64 closed 7 hours ago

binary64 commented 2 weeks ago

Describe the bug I'm trying to install on a single digitalocean droplet VPS:

TASK [k8s-init : Initialize kubernetes master] *********************************
fatal: [10.131.180.54]: FAILED! => {"changed": true, "cmd": "kubeadm init --config /tmp/kubeadm-init.conf --upload-certs > kube_init.log", "delta": "0:00:11.744977", "end": "2024-07-03 06:14:19.851095", "msg": "non-zero return code", "rc": 1, "start": "2024-07-03 06:14:08.106118", "stderr": "W0703 06:14:08.172751   21346 validation.go:28] Cannot validate kube-proxy config - no validator is available\nW0703 06:14:08.172813   21346 validation.go:28] Cannot validate kubelet config - no validator is available\n\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/\nerror execution phase preflight: [preflight] Some fatal errors occurred:\n\t[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.17.3: output: Error response from daemon: Get https://k8s.gcr.io/v2/kube-controller-manager/manifests/v1.17.3: unable to decode token response: invalid character '<' looking for beginning of value\n, error: exit status 1\n\t[ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.4.3-0: output: Error response from daemon: Get https://k8s.gcr.io/v2/etcd/manifests/3.4.3-0: unable to decode token response: invalid character '<' looking for beginning of value\n, error: exit status 1\n[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["W0703 06:14:08.172751   21346 validation.go:28] Cannot validate kube-proxy config - no validator is available", "W0703 06:14:08.172813   21346 validation.go:28] Cannot validate kubelet config - no validator is available", "\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/", "error execution phase preflight: [preflight] Some fatal errors occurred:", "\t[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.17.3: output: Error response from daemon: Get https://k8s.gcr.io/v2/kube-controller-manager/manifests/v1.17.3: unable to decode token response: invalid character '<' looking for beginning of value", ", error: exit status 1", "\t[ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.4.3-0: output: Error response from daemon: Get https://k8s.gcr.io/v2/etcd/manifests/3.4.3-0: unable to decode token response: invalid character '<' looking for beginning of value", ", error: exit status 1", "[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}

PLAY RECAP *********************************************************************
10.131.180.54              : ok=39   changed=22   unreachable=0    failed=1    skipped=16   rescued=0    ignored=0

To Reproduce Steps to reproduce the behavior:

  1. create a new Ubuntu 20 droplet
  2. apt update, apt upgrade, reboot now
  3. follow instructions on README
  4. See error

Expected behavior To not fail

lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal

stanfra commented 2 weeks ago

Hi @binary64, in trace I saw that fail appears because of pulling image problem: failed to pull image k8s.gcr.io/etcd:3.4.3-0: output: Error response from daemon: Get https://k8s.gcr.io/v2/etcd/manifests/3.4.3-0: unable to decode token response: invalid character '<' looking for beginning of value Please perform next commands to reset deployment: sudo kubeadm reset rm -rf ~/.kube sudo rm -rf /etc/kubernetes sudo rm -rf /var/lib/etcd/ echo "KUBELET_EXTRA_ARGS=" | sudo tee /etc/default/kubelet and try to execute ansible script again, if problem with pulling images reproduced please try to pull image manually and send us traceback from pull. Also, please, attach command which you use to execute ansible script.