zalando / skipper

An HTTP router and reverse proxy for service composition, including use cases like Kubernetes Ingress
https://opensource.zalando.com/skipper/
Other
3.1k stars 350 forks source link

Bug: Skipper does not filter EndpointSlices without ports, causing backed connections to TCP port 0 #3151

Closed rtomadpg closed 3 months ago

rtomadpg commented 3 months ago

Describe the bug

Skipper v0.21.146 using EndpointSlices as backend source (with -enable-kubernetes-endpointslices=true) will attempt to connect to backends on TCP port 0, when an Ingress Service has a EndpointSlice with ports: null.

Skipper will complain like this:

level=error msg="Failed to retry backend request: dialing failed true: failed to do backend roundtrip to 100.64.65.173:0: dial tcp 100.64.65.173:0: connect: connection refused"

I believe https://github.com/zalando/skipper/blob/b7825a336ab70409963fea03b74136632d49469e/dataclients/kubernetes/clusterclient.go#L551 should be conditional, because an EndpointSlice with ports: null is a rare, but state.

When Skipper is using Endpoints and not EndpointSlice, this does not happen. Not sure why, but based on some tests Endpoints and Endpointslices are clearly updated differently upon pods phase changes.

I found this issue when I turned on the EndpointSlice feature and an application became 100% unavailable. The root cause found is a Service label selector matching two Deployments, in combination with a named targetPort that was only used in a single Deployment. Obviously the Service label selector should be fixed to only match a single Deployment. Easy. But IMHO Skipper should check EndpointSlices for set ports.

To Reproduce

Sure, what's needed:

The YAMLs to reproduce:

apiVersion: v1
kind: Namespace
metadata:
  name: ports-null-repro
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: foo
    component: app
  name: foo-app
  namespace: ports-null-repro
spec:
  replicas: 1
  selector:
    matchLabels:
      app: foo
      component: app
  strategy: {}
  template:
    metadata:
      labels:
        app: foo
        component: app
    spec:
      containers:
      - command:
        - sleep
        - "99999"
        image: busybox
        name: busybox
        ports:
        - containerPort: 8000
          name: http-app
        resources: {}
      terminationGracePeriodSeconds: 0
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: foo
    component: worker
  name: foo-worker
  namespace: ports-null-repro
spec:
  replicas: 1
  selector:
    matchLabels:
      app: foo
      component: worker
  strategy: {}
  template:
    metadata:
      labels:
        app: foo
        component: worker
    spec:
      containers:
      - command:
        - sleep
        - "99999"
        image: busybox
        name: busybox
        ports:
        - containerPort: 8000
          name: http-worker
        resources: {}
      terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: foo
    component: app
  name: foo-app
  namespace: ports-null-repro
spec:
  ports:
  - name: "80"
    port: 80
    protocol: TCP
    targetPort: http-app                         # only used in the foo-app Deployment
  selector:
    app: foo                       # matching both foo-app and foo-worker Deployments
  type: ClusterIP

Applying these resource will result in a state like:

$ k -n ports-null-repro get pods,svc,endpoints,endpointslice
NAME                              READY   STATUS    RESTARTS   AGE
pod/foo-app-74769df7fb-kkkzz      1/1     Running   0          12m
pod/foo-worker-6656b88c64-mmd4r   1/1     Running   0          12m

NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/foo-app   ClusterIP   172.20.104.92   <none>        80/TCP    12m

NAME                ENDPOINTS             AGE
endpoints/foo-app   100.64.157.194:8000   12m

NAME                                           ADDRESSTYPE   PORTS     ENDPOINTS        AGE
endpointslice.discovery.k8s.io/foo-app-2lf8r   IPv4          <unset>   100.64.171.115   12m
endpointslice.discovery.k8s.io/foo-app-cwr5h   IPv4          8000      100.64.157.194   12m

See that EndpointSlice with <unset> ports? Inspecting it you'll see the ports: null:

$ k -n ports-null-repro get endpointslice.discovery.k8s.io/foo-app-2lf8r -oyaml
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
  - 100.64.171.115
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: ip-100-64-177-153.eu-west-1.compute.internal
  targetRef:
    kind: Pod
    name: foo-worker-6656b88c64-mmd4r
    namespace: ports-null-repro
    uid: b01bc27c-9452-443b-b9fd-15f7465436c5
  zone: eu-west-1c
kind: EndpointSlice
metadata:
  annotations:
    endpoints.kubernetes.io/last-change-trigger-time: "2024-07-12T13:28:58Z"
  creationTimestamp: "2024-07-12T13:28:52Z"
  generateName: foo-app-
  generation: 2
  labels:
    app: foo
    component: app
    endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
    kubernetes.io/service-name: foo-app
  name: foo-app-2lf8r
  namespace: ports-null-repro
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Service
    name: foo-app
    uid: 85b53edc-1507-4f04-a8b4-c263eba7cadd
  resourceVersion: "407757934"
  uid: 91df20d4-315b-47d4-a920-865bdcc42ed4
ports: null                                                                         # <------

Expected behavior

Skipper skips all endpoints from a EndpointSlice with ports: null.

Observed behavior

It does not skip endpoints from a EndpointSlice with ports: null and will attempt to connect to a backend on TCP port 0.

szuecs commented 3 months ago

Thanks for the bug report. If we follow the expected behavior then we would likely drop the endpointslice with ports: null and shortcut the route with status codes 502 body "no endpoints", because we can not just assume that the other endpointslice that does not match the port condition by name (if there is a name then we have to use by name lookup).

Filtering ports: null is a good idea anyways, but maybe it has not your expected result.

Can you please also show your ingress configuration? Important would be the service: object to understand your configuration.

rtomadpg commented 3 months ago

The (redacted) ingress looks like:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app
spec:
  ingressClassName: skipper-private
  rules:
  - host: app.example.com
    http:
      paths:
      - backend:
          service:
            name: app
            port:
              name: http
        path: /
        pathType: ImplementationSpecific

That Ingress spec.rules[0].http.paths[0].backend.service.port name matches the Service spec.ports[0].name.

Is this what you were interested in? If not, let me know what information you'd like to review.

szuecs commented 3 months ago

yes exactly!

szuecs commented 3 months ago

I created a reproducer to check what kube-proxy's result would be.

Setup

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: foo
    component: app
  name: foo-app
  namespace: skipper-3151
spec:
  replicas: 1
  selector:
    matchLabels:
      app: foo
      component: app
  strategy: {}
  template:
    metadata:
      labels:
        app: foo
        component: app
    spec:
      containers:
      - command:
        - sleep
        - "99999"
        image: busybox
        name: busybox
        ports:
        - containerPort: 8000
          name: http-app
        resources:
          limits:
            memory: 100Mi
      terminationGracePeriodSeconds: 0
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: foo
    component: worker
  name: foo-worker
  namespace: skipper-3151
spec:
  replicas: 1
  selector:
    matchLabels:
      app: foo
      component: worker
  strategy: {}
  template:
    metadata:
      labels:
        app: foo
        component: worker
    spec:
      containers:
      - command:
        - sleep
        - "99999"
        image: busybox
        name: busybox
        ports:
        - containerPort: 8000
          name: http-worker
        resources:
          limits:
            memory: 100Mi
      terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: foo
    component: app
  name: foo-app
  namespace: skipper-3151
spec:
  ports:
  - name: "http"  # changed from "80", because of https://github.com/zalando/skipper/issues/3151#issuecomment-2227940798 and because "80" would violate Kubernetes ingress jsonSchema and we would get error: `The Ingress "foo-named" is invalid: spec.rules[0].http.paths[0].backend.service.port.name: Invalid value: "80": must contain at least one letter (a-z)`
    port: 80
    protocol: TCP
    targetPort: http-app                         # only used in the foo-app Deployment
  selector:
    app: foo                       # matching both foo-app and foo-worker Deployments
  type: ClusterIP

Confirmation that we see ports: unset:

% kubectl -n skipper-3151 get endpointslices
NAME            ADDRESSTYPE   PORTS     ENDPOINTS      AGE
foo-app-82p9m   IPv4          <unset>   10.2.222.205   34m
foo-app-f826z   IPv4          8000      10.2.222.204   32m

kube-proxy figures out

root@ip-10-149-68-50:~# iptables-save | grep -i foo-app
-A KUBE-SEP-PSIMTRPQ2UROLPWX -s 10.2.222.204/32 -m comment --comment "skipper-3151/foo-app:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-PSIMTRPQ2UROLPWX -p tcp -m comment --comment "skipper-3151/foo-app:http" -m tcp -j DNAT --to-destination 10.2.222.204:8000
-A KUBE-SERVICES -d 10.5.217.248/32 -p tcp -m comment --comment "skipper-3151/foo-app:http cluster IP" -j KUBE-SVC-FVRQ5DEUCONLHUWF
-A KUBE-SVC-FVRQ5DEUCONLHUWF -m comment --comment "skipper-3151/foo-app:http -> 10.2.222.204:8000" -j KUBE-SEP-PSIMTRPQ2UROLPWX

# grouped by svc/endpoints rules
# svc
root@ip-10-149-68-50:~# iptables-save | grep KUBE-SVC-FVRQ5DEUCONLHUWF
:KUBE-SVC-FVRQ5DEUCONLHUWF - [0:0]
-A KUBE-SERVICES -d 10.5.217.248/32 -p tcp -m comment --comment "skipper-3151/foo-app:http cluster IP" -j KUBE-SVC-FVRQ5DEUCONLHUWF
-A KUBE-SVC-FVRQ5DEUCONLHUWF -m comment --comment "skipper-3151/foo-app:http -> 10.2.222.204:8000" -j KUBE-SEP-PSIMTRPQ2UROLPWX
# endpoints
root@ip-10-149-68-50:~# iptables-save | grep KUBE-SEP-PSIMTRPQ2UROLPWX
:KUBE-SEP-PSIMTRPQ2UROLPWX - [0:0]
-A KUBE-SEP-PSIMTRPQ2UROLPWX -s 10.2.222.204/32 -m comment --comment "skipper-3151/foo-app:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-PSIMTRPQ2UROLPWX -p tcp -m comment --comment "skipper-3151/foo-app:http" -m tcp -j DNAT --to-destination 10.2.222.204:8000
-A KUBE-SVC-FVRQ5DEUCONLHUWF -m comment --comment "skipper-3151/foo-app:http -> 10.2.222.204:8000" -j KUBE-SEP-PSIMTRPQ2UROLPWX

What do we get back from:

# svc
root@ip-10-149-88-110:~# curl http://10.5.217.248
curl: (7) Failed to connect to 10.5.217.248 port 80 after 1 ms: Connection refused
# endpoint
root@ip-10-149-68-50:~# curl http://10.2.222.204:8000
curl: (7) Failed to connect to 10.2.222.204 port 8000 after 1 ms: Connection refused

So let's check skipper by creating 3 different ingress configurations targeting this service:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  labels:
    app: foo
  name: foo-named
  namespace: skipper-3151
spec:
  rules:
  - host: foo-named.example
    http:
      paths:
      - backend:
          service:
            name: foo-app
            port:
              name: "http"
        path: /
        pathType: ImplementationSpecific
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  labels:
    app: foo
  name: foo-targeted-name
  namespace: skipper-3151
spec:
  rules:
  - host: foo-targeted-name.example
    http:
      paths:
      - backend:
          service:
            name: foo-app
            port:
              name: "http-app"
        path: /
        pathType: ImplementationSpecific
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  labels:
    app: foo
  name: foo-numbered
  namespace: skipper-3151
spec:
  rules:
  - host: foo-numbered.example
    http:
      paths:
      - backend:
          service:
            name: foo-app
            port:
              number: 80
        path: /
        pathType: ImplementationSpecific

Running ingresses are:

% kubectl -n skipper-3151 get ing                      NAME                CLASS    HOSTS                               ADDRESS                                                                     PORTS   AGE
foo-named           <none>   foo-named.example           kube-ingr-lb-8zjiiue....amazonaws.com   80      2m38s
foo-numbered        <none>   foo-numbered.example        kube-ingr-lb-8zjiiue....amazonaws.com   80      13m
foo-targeted-name   <none>   foo-targeted-name.example   kube-ingr-lb-8zjiiue....amazonaws.com    80      13m

Routing table output:

root@ip-10-149-68-50:~# curl -s http://localhost:9911/routes | grep kube_skipper_3151__ -A 8
kube_skipper_3151__foo_named__foo_named_example_____foo_app: Host(/^(foo-named[.]example[.]?(:[0-9]+)?)$/) && PathSubtree("/")
  -> <powerOfRandomNChoices, "http://10.2.222.204:8000", "http://10.2.222.205:8000">;

kube_skipper_3151__foo_named__foo_named_example_____foo_app_https_redirect: Host(/^(foo-named[.]example[.]?(:[0-9]+)?)$/) && Header("X-Forwarded-Proto", "http") && Weight(1000) && PathSubtree("/")
  -> redirectTo(308, "https:")
  -> <shunt>;

kube_skipper_3151__foo_numbered__foo_numbered_example_____foo_app: Host(/^(foo-numbered[.]example[.]?(:[0-9]+)?)$/) && PathSubtree("/")
  -> <powerOfRandomNChoices, "http://10.2.222.204:8000", "http://10.2.222.205:8000">;

kube_skipper_3151__foo_numbered__foo_numbered_example_____foo_app_https_redirect: Host(/^(foo-numbered[.]example[.]?(:[0-9]+)?)$/) && Header("X-Forwarded-Proto", "http") && Weight(1000) && PathSubtree("/")
  -> redirectTo(308, "https:")
  -> <shunt>;

kube_skipper_3151__foo_targeted_name__foo_targeted_name_example_____foo_app: Host(/^(foo-targeted-name[.]example[.]?(:[0-9]+)?)$/) && PathSubtree("/")
  -> status(502)
  -> inlineContent("no endpoints")
  -> <shunt>;

kube_skipper_3151__foo_targeted_name__foo_targeted_name_example_____foo_app_https_redirect: Host(/^(foo-targeted-name[.]example[.]?(:[0-9]+)?)$/) && Header("X-Forwarded-Proto", "http") && Weight(1000) && PathSubtree("/")
  -> redirectTo(308, "https:")
  -> <shunt>;

Testing response from endpoints seen in skipper routing table:

root@ip-10-149-88-110:~# curl http://10.2.222.204:8000
curl: (7) Failed to connect to 10.2.222.204 port 8000 after 1 ms: Connection refused
root@ip-10-149-88-110:~# curl http://10.2.222.205:8000
curl: (7) Failed to connect to 10.2.222.205 port 8000 after 1 ms: Connection refused

What do we get from skipper in 3 different configuration cases:

% curl https://foo-named.example
Bad Gateway

% curl https://foo-targeted-name.example
no endpoints

% curl https://foo-numbered.example
Bad Gateway
szuecs commented 3 months ago

In general I think it's fair to assume we should drop ports: unset and I will create a PR later today.

szuecs commented 3 months ago

Thanks for the great bug report! The fix is released in https://github.com/zalando/skipper/releases/tag/v0.21.150