rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.56k stars 268 forks source link

[Release-1.28] - RKE2 failing to start: fatal, Failed to apply network policy default-network-ingress-webhook-policy to namespace kube-system #5972

Closed brandond closed 4 months ago

brandond commented 5 months ago

Backport fix for RKE2 failing to start: fatal, Failed to apply network policy default-network-ingress-webhook-policy to namespace kube-system

fmoral2 commented 4 months ago

Validated on Version:

-$ rke2 version v1.28.11-rc5+rke2r1 (4f54a7a4ff785853bfae557a3657af91870338da)

Environment Details

Infrastructure Cloud EC2 instance

Node(s) CPU architecture, OS, and Version: ubuntu AMD

Cluster Configuration: -3 node server -1 node agents

Steps to validate the fix

  1. Install rke2
  2. Install helm rancher webhooks
  3. Join a new node on a upgraded version
  4. Validate rke2 is up and running
  5. Validate that no error from webhook is seen in the logs
  6. Validate pods

Reproduction Issue:

``` rke2 version v1.27.2+rke2r1 (300a06dabe679c779970112a9cb48b289c17536c) helm repo add rancher-latest https://releases.rancher.com/server-charts/latest helm install rancher rancher-latest/rancher \ --namespace cattle-system \ --set hostname=rancher.yourdomain.com kubectl create namespace kyverno helm repo add kyverno https://kyverno.github.io/kyverno/ helm install kyverno kyverno/kyverno --namespace kyverno kubectl get validatingwebhookconfigurations NAME WEBHOOKS AGE cert-manager-webhook 1 3m12s rke2-ingress-nginx-admission 1 21m rke2-snapshot-validation-webhook 1 21m validating-webhook-configuration 12 86s :~> kubectl get mutatingwebhookconfigurations NAME WEBHOOKS AGE cert-manager-webhook 1 3m21s mutating-webhook-configuration 9 95s On a new node joining the cluster upgrading version. sudo curl -sfL https://get.rke2.io | sudo INSTALL_RKE2_VERSION=v1.28.8+rke2r1 sh - sudo journalctl -u rke2-server -f | grep "failed to call webhook" Jun 21 12:01:31 rke2[2060]: time="2024-06-21T12:01:31Z" level=warning msg="Failed to create Kubernetes secret: Internal error occurred: failed calling webhook \"rancher.cattle.io.secrets\": failed to call webhook: Post \"https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/secrets?timeout=15s\": context deadline exceeded" ``` ## **Validation Results:**
``` helm repo add rancher-latest https://releases.rancher.com/server-charts/latest helm install rancher rancher-latest/rancher \ --namespace cattle-system \ --set hostname=rancher.yourdomain.com kubectl create namespace kyverno helm repo add kyverno https://kyverno.github.io/kyverno/ helm install kyverno kyverno/kyverno --namespace kyverno kubectl get validatingwebhookconfigurations NAME WEBHOOKS AGE cert-manager-webhook 1 3m12s rke2-ingress-nginx-admission 1 21m rke2-snapshot-validation-webhook 1 21m validating-webhook-configuration 12 86s :~> kubectl get mutatingwebhookconfigurations NAME WEBHOOKS AGE cert-manager-webhook 1 3m21s mutating-webhook-configuration 9 95s On a new node joining the cluster upgrading version. sudo curl -sfL https://get.rke2.io | sudo INSTALL_RKE2_VERSION=v1.29.2+rke2r1 sh - sudo journalctl -u rke2-server -f | grep "failed to call webhook" <> $ kubectl get pods -n cattle-system -l app=rancher-webhook NAME READY STATUS RESTARTS AGE rancher-webhook-dbdbf746-5fmdq 1/1 Running 0 6m1s $ kubectl get pods -n cattle-system NAME READY STATUS RESTARTS AGE helm-operation-8lwhn 2/2 Running 0 74s helm-operation-bmgxz 0/2 Completed 0 2m25s helm-operation-d22bd 2/2 Running 0 9s helm-operation-wdklt 2/2 Running 0 3m32s rancher-5875cfdb5f-97kvb 1/1 Running 0 5m40s rancher-5875cfdb5f-bvdhp 1/1 Running 0 5m40s rancher-5875cfdb5f-zgm8v 1/1 Running 0 5m40s ~$ kubectl get mutatingwebhookconfigurations NAME WEBHOOKS AGE cert-manager-webhook 1 6m45s kyverno-policy-mutating-webhook-cfg 1 3m5s kyverno-resource-mutating-webhook-cfg 0 3m5s kyverno-verify-mutating-webhook-cfg 1 3m4s mutating-webhook-configuration 9 5m12s :~$ kubectl get validatingwebhookconfigurations NAME WEBHOOKS AGE cert-manager-webhook 1 6m51s kyverno-cleanup-validating-webhook-cfg 1 5m59s kyverno-exception-validating-webhook-cfg 1 3m11s kyverno-global-context-validating-webhook-cfg 1 3m11s kyverno-policy-validating-webhook-cfg 1 3m11s kyverno-resource-validating-webhook-cfg 0 3m11s kyverno-ttl-validating-webhook-cfg 1 5m59s rke2-ingress-nginx-admission 1 38m rke2-snapshot-validation-webhook 1 38m validating-webhook-configuration 12 5m18s ```