traefik / traefik-helm-chart

Traefik Proxy Helm Chart
https://traefik.io
Apache License 2.0
1.07k stars 752 forks source link

Helm updates can lose dual HTTP3 port #982

Open ananace opened 9 months ago

ananace commented 9 months ago

Welcome!

What version of the Traefik's Helm Chart are you using?

26.0.0

What version of Traefik are you using?

2.10.6

What did you do?

Installed the Helm chart with;

ports:
  websecure:
    http3:
      enabled: true
      advertisedPort: 443

This results in a Deployment configured as;

# ...
         ports:
# ...
         - containerPort: 8443
           name: websecure
           protocol: TCP
         - containerPort: 8443
           name: websecure-http3
           protocol: UDP

And a service like;

# ...
  ports:
# ...
  - name: websecure
    port: 443
    protocol: TCP
    targetPort: websecure
  - name: websecure-http3
    port: 443
    protocol: UDP
    targetPort: websecure-http3

What did you see instead?

This configuration unfortunately seems to run into https://github.com/kubernetes/kubernetes/issues/105610 on helm upgrades, causing the two ports to merge into one - with the first websecure winning and removing the UDP port from existence.

The resulting deployment looks like;

# ...
        ports:
# ...
        - containerPort: 8000
          name: web
          protocol: TCP
        - containerPort: 8443
          name: websecure
          protocol: TCP

And the service ends up as;

# ...
  ports:
# ...
  - name: websecure
    port: 443
    protocol: TCP
    targetPort: websecure
  - name: websecure-http3    
    port: 443
    protocol: UDP
    targetPort: websecure-http3

This causes the incoming service traffic to fail as there's no websecure-http3 port to reference

What is your environment & configuration?

Bare-metal Kubernetes. Traefik installed from Helm chart using FluxCD.

helm-controller: v0.31.1
kustomize-controller: v0.35.0
notification-controller: v0.33.0
source-controller: v0.36.0

Additional Information

To keep the deployment working for now, I've added a post-renderer which replaces the service definition for the HTTP3 port to point directly to the port number instead of referring to it by name.

As a note, the template that the Helm Chart renders is correct. The issue is that Kubernetes API server unfortunately messes up the port list when applying upgrades - for at least certain setups, and since the service object the Helm chart renders refers to the ports by name this means that one of the HTTPS/HTTP3 ports ends up unable to be routed to. (Depending on which one gets lost by the API server merge)

mloiseleur commented 9 months ago

Hello @ananace,

Helm is not "breaking" HTTP/3. It's a limitation on Kubernetes side.

It's documented here in the values and here in the readme.

Is that more clear for you now ?

ananace commented 9 months ago

@mloiseleur The issue is entirely unrelated to either of those links. The problem here is that the Kubernetes API server munges the Helm-generated patch request (as described in the linked issue), merging the port definitions in the deployment for websecure/websecure-http3 into a single one - which causes either the websecure or websecure-http3 port definition to disappear.

The result on that is the the Service object can't route any HTTPS/HTTP3 traffic anymore, since it ends up referring to a port by a name that doesn't exist on the container definition in the Pod.
This will happen regardless if it's a single service or split service. (But it fails much harder on the single service, as both protocols stop routing in that case)

Edit:

As a side note; due to when/how the issue happens, it'll likely only appear if you perform certain helm upgrades which change the list of ports. It's possible that using a JSON patch with a kubectl patch call could also let you re-add the port without having it merged out of existence, but Helm doesn't work that way.

Edit 2:

To try to better explain what's going on, here's an example of how the update fails - here with the TCP port disappearing into the ether after the merge;

$ kubectl --namespace traefik get deploy traefik-2 -o jsonpath='{.spec.template.spec.containers[0].ports}' | jq
[
  {
    "containerPort": 8000,
    "name": "web",
    "protocol": "TCP"
  },
  {
    "containerPort": 8443,
    "name": "websecure",
    "protocol": "TCP"
  }
]
$ cat /tmp/example-patch.yaml 
spec:
  template:
    spec:
      containers:
      - name: traefik-2
        ports: 
        - containerPort: 8000
          name: web
          protocol: TCP
        - containerPort: 8443
          name: websecure
          protocol: TCP
        - containerPort: 8443
          name: websecure-http3
          protocol: UDP
$ kubectl --namespace traefik patch deploy traefik-2 --patch-file /tmp/example-patch.yaml 
deployment.apps/traefik-2 patched
$ kubectl --namespace traefik get deploy traefik-2 -o jsonpath='{.spec.template.spec.containers[0].ports}' | jq
[
  {
    "containerPort": 8000,
    "name": "web",
    "protocol": "TCP"
  },
  {
    "containerPort": 8443,
    "name": "websecure-http3",
    "protocol": "UDP"
  }
]
mloiseleur commented 9 months ago

\o I get it. There are two issues with TCP + UDP on same port , one (already documented) on the Service and this one on the Deployment.

The one on the Deployment will happen only when trying to patch. Aouch.

ananace commented 9 months ago

It seems to be somewhat related to the Kubernetes environment, Helm, and probably configuration as well.

My thought is that just offering the option to render the Service with numeric target port definitions should work - maybe with a values flag for choosing it.

jnoordsij commented 9 months ago

Ah yeah, this seems likes exactly the kind of problems I encountered while working on #656 and tried to avoid there by not introducing any additional named ports with the same numbers. Basically this only happens when using patch (i.e. helm upgrade) and not on initial install, which makes it even harder to track, due to the aforementioned issues with the merging behavior of kubectl.

(There is some kind of choice to be made here, as it is possible to go back to ensuring each number is used in a single pors only in all places, to avoid these kind of bugs. Altough this looks a bit ugly, as the UDP version is not declared, it does work as the container port list declaration is considered to be primarily informational; i.e. if a service on the container listens to a port, it is exposed regardless if it is listed or not. However this is quite confusing to users. However, I do agree this is semantically weird and not preferable.)

As for how to work around this particular issue in practice, I do the following: when using the --server-side flag with kubectl, the patching issue does not occur; therefore you can obtain the "correct" resulting manifest in your cluster by:

Also worth mentioning: there is an open proposal for Helm to add Server Side Apply support (see https://github.com/helm/community/pull/312). If that is ever implemented, this issue would be solved by Helm itself.

jonsch318 commented 6 months ago

The helm template and kubectl apply --server-side workaround worked for me.