k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.72k stars 2.32k forks source link

klipper-helm: Reinstalling job of a failed chart fails. (PR linked) #9499

Closed frederictobiasc closed 6 months ago

frederictobiasc commented 7 months ago

Environmental Info: K3s Version: v1.26.6+k3s1 (3b1919b0) go version go1.20.8

Node(s) CPU architecture, OS, and Version: Linux 6.1.59 #1-NixOS SMP PREEMPT_DYNAMIC Thu Oct 19 21:08:58 UTC 2023 x86_64 GNU/Linux

Cluster Configuration: single-node test

Describe the bug: Reinstalling of a failed chart triggers a bug in klipper-helm

Steps To Reproduce: Install a chart that fails to apply correctly to trigger klipper-helm's retry

Expected behavior: klipper-helm reinstall works correctly

Actual behavior: klipper-helm fails, due to an unescaped IP literal.

Additional context / logs: Please see the (alreay approved) PR: https://github.com/k3s-io/klipper-helm/pull/71

This contribution adds an escape in the reinstall logic, missed by the original PR that introduced IPv6 support (https://github.com/k3s-io/klipper-helm/pull/43). Now, also in the case of reinstalling, the script works as expected in IPv6 environments.

brandond commented 6 months ago

To test this fix:

  1. install k3s in ipv6-only mode, for example: k3s server --cluster-cidr=2001:cafe:42::/56 --service-cidr=2001:cafe:43::/112 --node-ip=fd7c:53a5:aef5::242:ac11:7
  2. mark the traefik chart release as failed:
    
    brandond@dev01:~$ kubectl run helm-test --rm --stdin --tty --command --namespace kube-system --overrides='{"spec":{"serviceAccount":"helm-traefik"}}' --image=docker.io/rancher/klipper-helm:v0.8.2-build20230815 sh -
    If you don't see a command prompt, try pressing enter.
    ~ $ helm_v3 set-status traefik failed
    2024/03/25 19:19:58 release traefik status updated

~ $ helm_v3 ls --all NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION traefik kube-system 1 2024-03-25 19:19:58.782157792 +0000 UTC failed traefik-25.0.2+up25.0.0 v2.10.5 traefik-crd kube-system 1 2024-03-25 18:50:44.237543558 +0000 UTC deployed traefik-crd-25.0.2+up25.0.0 v2.10.5

~ $ exit Session ended, resume using 'kubectl attach helm-test -c helm-test -i -t' command when the pod is running pod "helm-test" deleted

3. Apply a HelmChartConfig resource to trigger a reinstallation of the failed chart:
```yaml
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    ports:
      web:
        forwardedHeaders:
          trustedIPs:
            - 10.0.0.0/8
  1. Note that the reinstall succeeds and the chart is successfully deployed with the requested values
endawkins commented 6 months ago

Validated on branch master with commit 8aecc26 / version 1.29

Environment Details

Infrastructure

Node(s) CPU architecture, OS, and Version:

Linux i-0f477e039ffb149b9 6.5.0-1014-aws #14~22.04.1-Ubuntu SMP Thu Feb 15 15:27:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Cluster Configuration:

1 IPv6 Only Server

Config.yaml:

write-kubeconfig-mode: 644
token: test
node-ip: [redacted]
cluster-cidr: 2001:cafe:42:0::/56
service-cidr: 2001:cafe:42:1::/112
disable-network-policy: true
flannel-ipv6-masq: true
node_external_ip: [redacted]

Additional files

helmchartconfig.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    image:
      name: traefik
      tag: 2.9876.10
    ports:
      web:
        forwardedHeaders:
          trustedIPs:
            - 10.0.0.0/8
helmchartconfig1.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    ports:
      web:
        forwardedHeaders:
          trustedIPs:
            - 10.0.0.0/8

Testing Steps

  1. Launch Dualstack Instance from AWS
  2. Launch IPv6 Only Instance from AWS
  3. Copy .pem from local to dualstack instance
  4. ssh -i to IPv6 only instance
    $ ssh -i "<.pem_file>" user@<IPv6_ADDRESS>
  5. Configure IPv6 Instance
    • configure /etc/netplan/50-cloud-init.yaml (you may need to update your nameservers using nat64)
      network: 
      ethernets: 
      ens5: 
          dhcp4: true 
          dhcp6: true
          match: 
              macaddress: [redacted]
          set-name: ens5 
          nameservers: 
              addresses: ["[redacted]", "[redacted]", "[redacted]"] 
      version: 2
  6. sudo netplan apply
  7. Update /etc/hosts file:
    
    127.0.0.1 localhost

The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback i-

8. ```sudo systemctl stop systemd-resolved.service```
9. Update /etc/resolv.conf

nameserver options edns0 trust-ad search [redacted]

10. Copy config.yaml

$ sudo mkdir -p /etc/rancher/k3s/ && sudo cp config.yaml /etc/rancher/k3s/ && cat /etc/rancher/k3s/config.yaml

11. Install k3s
12. Apply the bad helmchartconfig

$ kubectl apply -f

13. Mark the traefik chart as failed:

$ kubectl run helm-test --rm --stdin --tty --command --namespace kube-system --overrides='{"spec":{"serviceAccount":"helm-traefik"}}' --image=docker.io/rancher/klipper-helm:v0.8.2-build20230815 sh - If you don't see a command prompt, try pressing enter. ~ $ helm_v3 set-status traefik failed ~ $ helm_v3 ls --all ~ $ exit Session ended, resume using 'kubectl attach helm-test -c helm-test -i -t' command when the pod is running pod "helm-test" deleted

14. Apply another HelmChartConfig for reinstallation:

$ kubectl apply -f helmchartconfig_new.yaml

15. Verify the reinstallation was successful

**Replication Results:**
- k3s version used for replication:
<!-- Provide the result of k3s -v -->

N/A


<!-- Provide all the observations -->

N/A

This issue is a hard-to-reproduce issue - so reproduction was not able to be captured.

**Validation Results:**
- k3s version used for validation:
<!-- Provide the result of k3s -v -->

$ k3s -v k3s version v1.29.3-rc1+k3s1 (8aecc26b) go version go1.21.8


<!-- Provide all the observations -->

~ $ helm_v3 set-status traefik failed 2024/03/25 21:37:02 release traefik status updated

~ $ helm_v3 ls --all NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION traefik kube-system 2 2024-03-25 21:37:02.01189159 +0000 UTC failed traefik-25.0.2+up25.0.0 v2.10.5 traefik-crd kube-system 1 2024-03-25 21:17:48.926277577 +0000 UTC deployed traefik-crd-25.0.2+up25.0.0 v2.10.5

Session ended, resume using 'kubectl attach helm-test -c helm-test -i -t' command when the pod is running pod "helm-test" deleted


**Additional context / logs:**

$ kubectl get nodes,pods -A -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node/i-0f477e039ffb149b9 Ready control-plane,master 102s v1.29.3-rc1+k3s1 [redacted] Ubuntu 22.04.4 LTS 6.5.0-1014-aws containerd://1.7.11-k3s2

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system pod/coredns-6799fbcd5-7pj8z 1/1 Running 0 86s 2001:cafe:42::3 i-0f477e039ffb149b9 kube-system pod/local-path-provisioner-6c86858495-6z2rs 1/1 Running 0 86s 2001:cafe:42::2 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-crd-p2rjg 0/1 Completed 0 87s 2001:cafe:42::5 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-95t5b 0/1 Completed 1 87s 2001:cafe:42::6 i-0f477e039ffb149b9 kube-system pod/svclb-traefik-c52582ad-22qnx 2/2 Running 0 68s 2001:cafe:42::7 i-0f477e039ffb149b9 kube-system pod/traefik-f4564c4f4-ngx7b 1/1 Running 0 68s 2001:cafe:42::8 i-0f477e039ffb149b9 kube-system pod/metrics-server-54fd9b65b-qxjfp 1/1 Running 0 86s 2001:cafe:42::4 i-0f477e039ffb149b9

$ kubectl get nodes,pods -A -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node/i-0f477e039ffb149b9 Ready control-plane,master 2m10s v1.29.3-rc1+k3s1 [redacted] Ubuntu 22.04.4 LTS 6.5.0-1014-aws containerd://1.7.11-k3s2

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system pod/coredns-6799fbcd5-7pj8z 1/1 Running 0 114s 2001:cafe:42::3 i-0f477e039ffb149b9 kube-system pod/local-path-provisioner-6c86858495-6z2rs 1/1 Running 0 114s 2001:cafe:42::2 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-crd-p2rjg 0/1 Completed 0 115s 2001:cafe:42::5 i-0f477e039ffb149b9 kube-system pod/svclb-traefik-c52582ad-22qnx 2/2 Running 0 96s 2001:cafe:42::7 i-0f477e039ffb149b9 kube-system pod/traefik-f4564c4f4-ngx7b 1/1 Running 0 96s 2001:cafe:42::8 i-0f477e039ffb149b9 kube-system pod/metrics-server-54fd9b65b-qxjfp 1/1 Running 0 114s 2001:cafe:42::4 i-0f477e039ffb149b9 kube-system pod/traefik-575c9785d-wvs95 0/1 ContainerCreating 0 1s i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-kbdhj 0/1 Completed 0 4s 2001:cafe:42::9 i-0f477e039ffb149b9

$ kubectl get nodes,pods -A -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node/i-0f477e039ffb149b9 Ready control-plane,master 2m11s v1.29.3-rc1+k3s1 [redacted] Ubuntu 22.04.4 LTS 6.5.0-1014-aws containerd://1.7.11-k3s2

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system pod/coredns-6799fbcd5-7pj8z 1/1 Running 0 115s 2001:cafe:42::3 i-0f477e039ffb149b9 kube-system pod/local-path-provisioner-6c86858495-6z2rs 1/1 Running 0 115s 2001:cafe:42::2 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-crd-p2rjg 0/1 Completed 0 116s 2001:cafe:42::5 i-0f477e039ffb149b9 kube-system pod/svclb-traefik-c52582ad-22qnx 2/2 Running 0 97s 2001:cafe:42::7 i-0f477e039ffb149b9 kube-system pod/traefik-f4564c4f4-ngx7b 1/1 Running 0 97s 2001:cafe:42::8 i-0f477e039ffb149b9 kube-system pod/metrics-server-54fd9b65b-qxjfp 1/1 Running 0 115s 2001:cafe:42::4 i-0f477e039ffb149b9 kube-system pod/traefik-575c9785d-wvs95 0/1 ErrImagePull 0 2s 2001:cafe:42::a i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-kbdhj 0/1 Completed 0 5s i-0f477e039ffb149b9

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node/i-0f477e039ffb149b9 Ready control-plane,master 21m v1.29.3-rc1+k3s1 2600:1f1c:ab4:ee32:f5fd:5c17:66c2:a1f5 Ubuntu 22.04.4 LTS 6.5.0-1014-aws containerd://1.7.11-k3s2

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system pod/local-path-provisioner-6c86858495-kk9l5 1/1 Running 0 21m 2001:cafe:42::4 i-0f477e039ffb149b9 kube-system pod/coredns-6799fbcd5-pqjdl 1/1 Running 0 21m 2001:cafe:42::3 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-crd-x9h7p 0/1 Completed 0 21m 2001:cafe:42::2 i-0f477e039ffb149b9 kube-system pod/metrics-server-54fd9b65b-jjbkh 1/1 Running 0 21m 2001:cafe:42::5 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-ql9fl 1/1 Running 0 5s 2001:cafe:42::c i-0f477e039ffb149b9 kube-system pod/traefik-5ccfc5bbc9-jqcws 0/1 ContainerCreating 0 2s i-0f477e039ffb149b9 kube-system pod/svclb-traefik-27450f51-94qz6 0/2 ContainerCreating 0 1s i-0f477e039ffb149b9 kube-system pod/traefik-f4564c4f4-hd2gd 0/1 Terminating 0 20m i-0f477e039ffb149b9

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node/i-0f477e039ffb149b9 Ready control-plane,master 21m v1.29.3-rc1+k3s1 2600:1f1c:ab4:ee32:f5fd:5c17:66c2:a1f5 Ubuntu 22.04.4 LTS 6.5.0-1014-aws containerd://1.7.11-k3s2

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system pod/local-path-provisioner-6c86858495-kk9l5 1/1 Running 0 21m 2001:cafe:42::4 i-0f477e039ffb149b9 kube-system pod/coredns-6799fbcd5-pqjdl 1/1 Running 0 21m 2001:cafe:42::3 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-crd-x9h7p 0/1 Completed 0 21m 2001:cafe:42::2 i-0f477e039ffb149b9 kube-system pod/metrics-server-54fd9b65b-jjbkh 1/1 Running 0 21m 2001:cafe:42::5 i-0f477e039ffb149b9 kube-system pod/helm-install-traefik-ql9fl 0/1 Completed 0 6s 2001:cafe:42::c i-0f477e039ffb149b9 kube-system pod/svclb-traefik-27450f51-94qz6 2/2 Running 0 2s 2001:cafe:42::d i-0f477e039ffb149b9 kube-system pod/traefik-5ccfc5bbc9-jqcws 0/1 Running 0 3s 2001:cafe:42::e i-0f477e039ffb149b9