k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.08k stars 2.35k forks source link

[Release-1.27] - klipper-helm: Reinstalling job of a failed chart fails. (PR linked) #9623

Closed brandond closed 7 months ago

brandond commented 8 months ago

Backport fix for klipper-helm: Reinstalling job of a failed chart fails. (PR linked)

endawkins commented 7 months ago

Validated on branch 1.27 with 78ad575 / version 1.27

Environment Details

Infrastructure

Node(s) CPU architecture, OS, and Version:

Linux i-0f477e039ffb149b9 6.5.0-1014-aws #14~22.04.1-Ubuntu SMP Thu Feb 15 15:27:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Cluster Configuration:

1 IPv6 Only Server

Config.yaml:

write-kubeconfig-mode: 644
token: test
node-ip: [redacted]
cluster-cidr: 2001:cafe:42:0::/56
service-cidr: 2001:cafe:42:1::/112
disable-network-policy: true
flannel-ipv6-masq: true
node_external_ip: [redacted]

Additional files

helmchartconfig.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    image:
      name: traefik
      tag: 2.9876.10
    ports:
      web:
        forwardedHeaders:
          trustedIPs:
            - 10.0.0.0/8
helmchartconfig1.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    ports:
      web:
        forwardedHeaders:
          trustedIPs:
            - 10.0.0.0/8

Testing Steps

  1. Launch Dualstack Instance from AWS
  2. Launch IPv6 Only Instance from AWS
  3. Copy .pem from local to dualstack instance
  4. ssh -i to IPv6 only instance
    $ ssh -i "<.pem_file>" user@<IPv6_ADDRESS>
  5. Configure IPv6 Instance
    • configure /etc/netplan/50-cloud-init.yaml (you may need to update your nameservers using nat64)
      network: 
      ethernets: 
      ens5: 
          dhcp4: true 
          dhcp6: true
          match: 
              macaddress: [redacted]
          set-name: ens5 
          nameservers: 
              addresses: ["[redacted]", "[redacted]", "[redacted]"] 
      version: 2
  6. sudo netplan apply
  7. Update /etc/hosts file:
    
    127.0.0.1 localhost

The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback i-

8. ```sudo systemctl stop systemd-resolved.service```
9. Update /etc/resolv.conf

nameserver options edns0 trust-ad search [redacted]

10. Copy config.yaml

$ sudo mkdir -p /etc/rancher/k3s/ && sudo cp config.yaml /etc/rancher/k3s/ && cat /etc/rancher/k3s/config.yaml

11. Install k3s
12. Apply the bad helmchartconfig

$ kubectl apply -f

13. Mark the traefik chart as failed:

$ kubectl run helm-test --rm --stdin --tty --command --namespace kube-system --overrides='{"spec":{"serviceAccount":"helm-traefik"}}' --image=docker.io/rancher/klipper-helm:v0.8.2-build20230815 sh - If you don't see a command prompt, try pressing enter. ~ $ helm_v3 set-status traefik failed ~ $ helm_v3 ls --all ~ $ exit Session ended, resume using 'kubectl attach helm-test -c helm-test -i -t' command when the pod is running pod "helm-test" deleted

14. Apply another HelmChartConfig for reinstallation:

$ kubectl apply -f helmchartconfig_new.yaml

15. Verify the reinstallation was successful

**Replication Results:**
- k3s version used for replication:
<!-- Provide the result of k3s -v -->

N/A


<!-- Provide all the observations -->

N/A

This issue is a hard-to-reproduce issue - so reproduction was not able to be captured.

**Validation Results:**
- k3s version used for validation:
<!-- Provide the result of k3s -v -->

$ k3s -v k3s version v1.27.12-rc1+k3s1 (78ad5756) go version go1.21.8


<!-- Provide all the observations -->

~ $ helm_v3 set-status traefik failed 2024/03/25 22:27:48 release traefik status updated

~ $ helm_v3 ls --all NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION traefik kube-system 2 2024-03-25 22:27:48.03672866 +0000 UTC failed traefik-25.0.2+up25.0.0 v2.10.5 traefik-crd kube-system 1 2024-03-25 22:12:09.184066937 +0000 UTC deployed traefik-crd-25.0.2+up25.0.0 v2.10.5

~ $ exit Session ended, resume using 'kubectl attach helm-test -c helm-test -i -t' command when the pod is running pod "helm-test" deleted


**Additional context / logs:**

$ kubectl get nodes,pods -A -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node/i-0f477e039ffb149b9 Ready control-plane,master 11m v1.27.12-rc1+k3s1 [redacted] Ubuntu 22.04.4 LTS 6.5.0-1014-aws containerd://1.7.11-k3s2.27

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system pod/local-path-provisioner-79ffd768b5-rm8v2 1/1 Running 0 11m 2001:cafe:42::4 i-0f477e039ffb149b9 kube-system pod/coredns-77ccd57875-sgscj 1/1 Running 0 11m 2001:cafe:42::2 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-crd-zfdwk 0/1 Completed 0 11m 2001:cafe:42::3 i-0f477e039ffb149b9 kube-system pod/svclb-traefik-ac7dafdc-w4hmg 2/2 Running 0 10m 2001:cafe:42::7 i-0f477e039ffb149b9 kube-system pod/traefik-768bdcdcdd-6pxhk 1/1 Running 0 10m 2001:cafe:42::8 i-0f477e039ffb149b9 kube-system pod/metrics-server-c44988498-clzvf 1/1 Running 0 11m 2001:cafe:42::6 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-xpj85 0/1 Completed 0 2m8s 2001:cafe:42::9 i-0f477e039ffb149b9 kube-system pod/traefik-7cd9458f4d-hf79l 0/1 ImagePullBackOff 0 2m5s 2001:cafe:42::a i-0f477e039ffb149b9

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node/i-0f477e039ffb149b9 Ready control-plane,master 19m v1.27.12-rc1+k3s1 [redacted] Ubuntu 22.04.4 LTS 6.5.0-1014-aws containerd://1.7.11-k3s2.27

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system pod/local-path-provisioner-79ffd768b5-rm8v2 1/1 Running 0 19m 2001:cafe:42::4 i-0f477e039ffb149b9 kube-system pod/coredns-77ccd57875-sgscj 1/1 Running 0 19m 2001:cafe:42::2 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-crd-zfdwk 0/1 Completed 0 19m 2001:cafe:42::3 i-0f477e039ffb149b9 kube-system pod/metrics-server-c44988498-clzvf 1/1 Running 0 19m 2001:cafe:42::6 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-95kgr 1/1 Running 0 3s 2001:cafe:42::c i-0f477e039ffb149b9 kube-system pod/traefik-768bdcdcdd-6pxhk 1/1 Terminating 0 18m 2001:cafe:42::8 i-0f477e039ffb149b9 kube-system pod/svclb-traefik-ac7dafdc-w4hmg 0/2 Terminating 0 18m i-0f477e039ffb149b9

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node/i-0f477e039ffb149b9 Ready control-plane,master 21m v1.27.12-rc1+k3s1 [redacted] Ubuntu 22.04.4 LTS 6.5.0-1014-aws containerd://1.7.11-k3 s2.27

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system pod/local-path-provisioner-79ffd768b5-rm8v2 1/1 Running 0 21m 2001:cafe:42::4 i-0f477e039ffb149b9 kube-system pod/coredns-77ccd57875-sgscj 1/1 Running 0 21m 2001:cafe:42::2 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-crd-zfdwk 0/1 Completed 0 21m 2001:cafe:42::3 i-0f477e039ffb149b9 kube-system pod/metrics-server-c44988498-clzvf 1/1 Running 0 21m 2001:cafe:42::6 i-0f477e039ffb149b9 kube-system pod/svclb-traefik-7d49c1f2-ktz88 2/2 Running 0 2m3s 2001:cafe:42::d i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-95kgr 0/1 Completed 0 2m6s 2001:cafe:42::c i-0f477e039ffb149b9 kube-system pod/traefik-54dfd465df-wpbbm 1/1 Running 0 2m3s 2001:cafe:42::e i-0f477e039ffb149b9