k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
26.62k stars 2.24k forks source link

Network policies blocking probes #10030

Closed kyrofa closed 2 weeks ago

kyrofa commented 2 weeks ago

Environmental Info: K3s Version: v1.28.8+k3s1

Node(s) CPU architecture, OS, and Version: Linux s3 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux

Cluster Configuration: 3 servers, dual stack, ipv6 primary. Using default flannel.

Describe the bug: Network policy blocks probes.

Steps To Reproduce: Deploy this test deployment:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
----
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        livenessProbe: &probe
          httpGet:
            path: /
            port: 80
        readinessProbe:
          <<: *probe
        startupProbe:
          <<: *probe
---
apiVersion: v1
kind: Service
metadata:
  name: ngnix-service
spec:
  selector:
    app: nginx
  ports:
  - port: 80

You should fairly quickly see the nginx pod's startup probe fail:

$ kubectl describe pods
<snip>
  Warning  Unhealthy  7m48s (x7 over 9m8s)    kubelet            Startup probe failed: Get "http://[fda5:1111:c:0:1::43f]:80/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    7m48s (x3 over 8m48s)   kubelet            Container nginx failed startup probe, will be restarted

The nginx log appears fine, and indeed you can see no probe hitting it:

$ kubectl logs -f deployment/nginx
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2024/04/26 21:31:24 [notice] 1#1: using the "epoll" event method
2024/04/26 21:31:24 [notice] 1#1: nginx/1.25.5
2024/04/26 21:31:24 [notice] 1#1: built by gcc 12.2.0 (Debian 12.2.0-14) 
2024/04/26 21:31:24 [notice] 1#1: OS: Linux 6.1.0-20-amd64
2024/04/26 21:31:24 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2024/04/26 21:31:24 [notice] 1#1: start worker processes
<snip>
2024/04/26 21:31:24 [notice] 1#1: start worker process 108
2024/04/26 21:31:24 [notice] 1#1: start worker process 109

Now delete that network policy:

$ kubectl delete networkpolicy default-deny
networkpolicy.networking.k8s.io "default-deny" deleted

You'll immediately starting seeing the probes hit nginx:

$ kubectl logs -f deployment/nginx
<snip>
fda5:1111:c:0:1::1 - - [26/Apr/2024:21:37:57 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.28" "-"
fda5:1111:c:0:1::1 - - [26/Apr/2024:21:38:07 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.28" "-"
fda5:1111:c:0:1::1 - - [26/Apr/2024:21:38:07 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.28"

I believe the probes should be considered internal to the network namespace of the pod, and thus shouldn't be effected by the network policies at all, but that doesn't appear to be the case here. Any idea what's going on?

brandond commented 2 weeks ago

Possible duplicate of

rbrtbnfgl commented 2 weeks ago

I tried with k3s v1.28.9 and it's been fixed.

brandond commented 2 weeks ago

Closing as duplicate/symptom of #9925

kyrofa commented 2 weeks ago

Confirmed, upgraded the cluster to v1.29.4+k3s1 this morning and this issue no longer happens.