jupyterhub / action-k3s-helm

A GitHub action to install K3S, Calico, and Helm.
BSD 3-Clause "New" or "Revised" License
25 stars 6 forks source link

Gather information and handle intermittent error #30

Closed consideRatio closed 1 year ago

consideRatio commented 2 years ago

I've seen this error show up quite a few times and I've decided its worth tracking down at this point. Logs are copy-pasted in below as the logs will be removed after a while.

The gist is the following, after a 5 min timeout, the following can happen. I doubt it is because the timeout is too short, something else is wrong. We should provide better information about it if it happens so we can figure out how to handle it.

Waiting for deployment "calico-kube-controllers" rollout to finish: 0 of 1 updated replicas are available...
error: timed out waiting for the condition
Error: Process completed with exit code 1.
Run jupyterhub/action-k3s-helm@v1
  with:
    k3s-channel: v1.19
    metrics-enabled: false
    traefik-enabled: false
    docker-enabled: true

---

[INFO]  Finding release for channel v1.19
[INFO]  Using v1.19.14+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.19.14+k3s1/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.19.14+k3s1/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping /usr/local/bin/kubectl symlink to k3s, already exists
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Skipping /usr/local/bin/ctr symlink to k3s, command exists in PATH at /usr/bin/ctr
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
poddisruptionbudget.policy/calico-kube-controllers created
Helm v3.6.3 is already latest
Waiting for daemon set spec update to be observed...
Waiting for daemon set "calico-node" rollout to finish: 0 of 1 updated pods are available...
daemon set "calico-node" successfully rolled out
Waiting for deployment "calico-kube-controllers" rollout to finish: 0 of 1 updated replicas are available...
error: timed out waiting for the condition
Error: Process completed with exit code 1.
consideRatio commented 1 year ago

Sometimes, pulling the docker images can hickup. Perhaps this is rate limiting from the docker container registry? I've seen that this failure co-incides with other failures when running jobs in parallel. If there is one failure, its often more than one.

image

kubectl describe pod associated with ds/calico-node

image