Closed sameh-farouk closed 1 year ago
root@MR113e9c8e:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-84bb864455-dbnmk 1/1 Running 0 28m
kube-system coredns-96cc4f57d-ds7mv 1/1 Running 0 28m
kube-system metrics-server-ff9dbcb6c-svqzb 1/1 Running 0 28m
kube-system helm-install-traefik--1-fq57z 0/1 CrashLoopBackOff 10 (19s ago) 28m
helm-install-traefik pod keeps restarted, the reason is the error at the end of the pod logs
root@MR113e9c8e:~# kubectl logs -n kube-system helm-install-traefik--1-fq57z
CHART="${CHART//%\{KUBERNETES_API\}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
set +v -x
+ [[ '' != \t\r\u\e ]]
+ + export tiller HELM_HOST=127.0.0.1:44134--listen=127.0.0.1:44134
--storage=secret
+ HELM_HOST=127.0.0.1:44134
+ helm_v2 init --skip-refresh --client-only --stable-repo-url https://charts.helm.sh/stable/
[main] 2023/01/04 16:07:49 Starting Tiller v2.17.0 (tls=false)
[main] 2023/01/04 16:07:49 GRPC listening on 127.0.0.1:44134
[main] 2023/01/04 16:07:49 Probes listening on :44135
[main] 2023/01/04 16:07:49 Storage driver is Secret
[main] 2023/01/04 16:07:49 Max history per release is 0
Creating /home/klipper-helm/.helm
Creating /home/klipper-helm/.helm/repository
Creating /home/klipper-helm/.helm/repository/cache
Creating /home/klipper-helm/.helm/repository/local
Creating /home/klipper-helm/.helm/plugins
Creating /home/klipper-helm/.helm/starters
Creating /home/klipper-helm/.helm/cache/archive
Creating /home/klipper-helm/.helm/repository/repositories.yaml
Adding stable repo with URL: https://charts.helm.sh/stable/
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /home/klipper-helm/.helm.
Not installing Tiller due to 'client-only' flag having been set
++ jq -r '.Releases | length'
++ timeout -s KILL 30 helm_v2 ls --all '^traefik$' --output json
[storage] 2023/01/04 16:07:50 listing all releases with filter
+ V2_CHART_EXISTS=
+ [[ '' == \1 ]]
+ [[ '' == \v\2 ]]
+ [[ -n '' ]]
+ shopt -s nullglob
+ helm_content_decode
+ set -e
+ ENC_CHART_PATH=/chart/traefik.tgz.base64
+ CHART_PATH=/tmp/traefik.tgz
+ [[ ! -f /chart/traefik.tgz.base64 ]]
+ return
+ [[ install != \d\e\l\e\t\e ]]
+ helm_repo_init
+ grep -q -e 'https\?://'
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
+ [[ traefik == stable/* ]]
+ [[ -n https://helm.traefik.io/traefik ]]
+ helm_v3 repo add traefik https://helm.traefik.io/traefik
"traefik" has been added to your repositories
+ helm_v3 repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "traefik" chart repository
Update Complete. ⎈Happy Helming!⎈
+ helm_update install --repo https://helm.traefik.io/traefik
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
++ helm_v3 ls ++ tr '[:upper:]' '[:lower:]'
--all -f '^traefik$' --namespace kube-system --output json
++ jq -r '"\(.[0].app_version),\(.[0].status)"'
+ LINE=null,null
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-01_HelmChart.yaml'
+ [[ install = \d\e\l\e\t\e ]]
+ [[ null =~ ^(|null)$ ]]
+ [[ null =~ ^(|null)$ ]]
+ helm_v3 install --repo https://helm.traefik.io/traefik traefik traefik --values /config/values-01_HelmChart.yaml
Error: execution error at (traefik/templates/deployment.yaml:3:8): ERROR: Helm >= 3.9.0 is required
I know traefik is integrated into k3s by default, so I want to know how we deploy k3s, I checked the flist docker file and the zinit services to find that:
--no-deploy traefik
to prevent the server from deploying the packaged/embedded Traefik component, this option btw marked as deprecated. /var/lib/rancher/k3s/server/manifests
. this will be installed at runtime by the rancher/helm-controller.HelmChartConfig
manifest, instead of the HelmChart
manifest we provide. we shouldn't use --no-deploy traefik
on server start, as we don't want to use different ingress controller than Traefik, we just need to apply our config on top of the k3s embedded one.I tried this fix on the fly to make sure that my findings are correct. Here are my steps:
/scripts/entrypoint.sh
and removed --no-deploy traefik
from the server start args. this should allow k3s to deploy default packaged traefik on runtime.
echo "KUBECONFIG=/etc/rancher/k3s/k3s.yaml" >> /etc/environment
if [ -z "${K3S_DATA_DIR}" ]; then K3S_DATA_DIR="" else cp -r /var/lib/rancher/k3s/* $K3S_DATA_DIR K3S_DATA_DIR="--data-dir $K3S_DATA_DIR --kubelet-arg=root-dir=$K3S_DATA_DIR/kubelet" fi
if [ -z "${K3S_FLANNEL_IFACE}" ]; then K3S_FLANNEL_IFACE=eth0 fi
if [ "$K3S_URL" = "" ]; then k3s server --flannel-iface $K3S_FLANNEL_IFACE $K3S_DATA_DIR >> /var/log/k3s-service.log 2>&1 else k3s agent --flannel-iface $K3S_FLANNEL_IFACE $K3S_DATA_DIR >> /var/log/k3s-service.log 2>&1 fi
- I moved `/var/lib/rancher/k3s/server/manifests/traefik.yaml` to `/tmp` and created this one instead `traefik-config.yaml`
```sh
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
additionalArguments:
- "--certificatesresolvers.default.acme.tlschallenge"
- "--certificatesresolvers.default.acme.email=dsafsdajfksdhfkjadsfoo@you.com"
- "--certificatesresolvers.default.acme.storage=/data/acme.json"
- "--certificatesresolvers.default.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
- "--certificatesresolvers.default.acme.httpchallenge.entrypoint=web"
- "--certificatesresolvers.gridca.acme.tlschallenge"
- "--certificatesresolvers.gridca.acme.email=dsafsdajfksdhfkjadsfoo@you.com"
- "--certificatesresolvers.gridca.acme.storage=/data/acme1.json"
- "--certificatesresolvers.gridca.acme.caserver=https://ca1.grid.tf"
- "--certificatesresolvers.gridca.acme.httpchallenge.entrypoint=web"
- "--certificatesresolvers.le.acme.tlschallenge"
- "--certificatesresolvers.le.acme.email=dsafsdajfksdhfkjadsfoo@you.com"
- "--certificatesresolvers.le.acme.storage=/data/acme2.json"
- "--certificatesresolvers.le.acme.caserver=https://acme-v02.api.letsencrypt.org/directory"
- "--certificatesresolvers.le.acme.httpchallenge.entrypoint=web"
ports:
web:
redirectTo: websecure
websecure:
tls:
enabled: true
then
kubectl -n kube-system delete helmcharts.helm.cattle.io traefik
init
service
zinit stop init
zinit start init
After that i want to see if traefik was successfully installed.
the helm-install-traefik
pod ran successfully. however, helm-install-traefik-crd
was exit with error.
I checked the logs of that pod and it shows this error
Error: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "ingressroutes.traefik.containo.us" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "traefik-crd"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "kube-system"
These were leftover from the previously failed install, so I deleted those
kubectl get crds --no-headers=true | awk '/traefik/{print $1}'| xargs kubectl delete crds
restarted the pod
kubectl get pod/helm-install-traefik-crd--1-cx7bb -n kube-system -o yaml | kubectl replace --force -f -
and everything went fine
root@MR113e9c8e:~# kubectl get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/local-path-provisioner-84bb864455-dbnmk 1/1 Running 0 3h29m
kube-system pod/coredns-96cc4f57d-ds7mv 1/1 Running 0 3h29m
kube-system pod/metrics-server-ff9dbcb6c-svqzb 1/1 Running 0 3h29m
kube-system pod/svclb-traefik-hcprh 2/2 Running 0 169m
kube-system pod/helm-install-traefik--1-f29hk 0/1 Completed 0 169m
kube-system pod/svclb-traefik-ms9p4 2/2 Running 0 169m
kube-system pod/traefik-f75f5998-zj4n8 1/1 Running 0 169m
kube-system pod/helm-install-traefik-crd--1-r4drp 0/1 Completed 0 91m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 3h30m
kube-system service/kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 3h30m
kube-system service/metrics-server ClusterIP 10.43.162.26 <none> 443/TCP 3h30m
kube-system service/traefik LoadBalancer 10.43.235.87 10.20.2.2,10.20.2.3 80:31346/TCP,443:30163/TCP 169m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/svclb-traefik 2 2 2 2 2 <none> 169m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/local-path-provisioner 1/1 1 1 3h30m
kube-system deployment.apps/coredns 1/1 1 1 3h30m
kube-system deployment.apps/metrics-server 1/1 1 1 3h30m
kube-system deployment.apps/traefik 1/1 1 1 169m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/local-path-provisioner-84bb864455 1 1 1 3h29m
kube-system replicaset.apps/coredns-96cc4f57d 1 1 1 3h29m
kube-system replicaset.apps/metrics-server-ff9dbcb6c 1 1 1 3h29m
kube-system replicaset.apps/traefik-f75f5998 1 1 1 169m
NAMESPACE NAME COMPLETIONS DURATION AGE
kube-system job.batch/helm-install-traefik 1/1 10s 169m
kube-system job.batch/helm-install-traefik-crd 1/1 77m 169m
should i just fix the current k3s image (1.22.7), or also upgrade k3s version (1.26.0) @xmonader?
Please do the upgrade as well. Thank you!
Update: PR ready for review https://github.com/threefoldtech/tf-images/pull/122
please cross-link or promote this flist to tf-official-apps
@maxux
samehabouelsaad.3bot/abouelsaad-k3s_1.26.0-latest.flist
-> tf-official-apps/threefoldtech-k3s-latest.flist
Update: a new Flist is available now with the latest Kubernetes release 1.26 and a few fixes and improvements. It includes a fix for this issue as we now use the embedded Traefik component vs overriding the prepackaged manifest. https://github.com/threefoldtech/tf-images/pull/122 The update will take place, as soon as the new flist gets promoted to the official apps' repo.
the original issue happens on mainnet and testnet: https://github.com/threefoldtech/tf_support/issues/412#issuecomment-1369799997
I was able to reproduce it on Devnet as well.
I will share the debugging session's findings ASAP. We will need to update the k3s image.