knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.52k stars 1.15k forks source link

Queue Proxy health checks incompatible with non-HTTP/2 applications #15432

Open braunsonm opened 1 month ago

braunsonm commented 1 month ago

/area networking

What version of Knative?

1.15.0

Expected Behavior

Legacy applications may have undefined behavior when HTTP/2 upgrade requests are made. Knative should gracefully handle those errors and downgrade the health check attempt to HTTP/1 or HTTP/1.1.

Actual Behavior

Applications which do not support HTTP/2 will not handle the upgrade request properly. In our case, a legacy application returns a 500 when OPTIONS are sent to upgrade the connection. Knative fails the entire healthcheck because of this, even if the same check over HTTP/1 or HTTP/1.1 will properly return a 200.

Steps to Reproduce the Problem

  1. Create an application which does not support HTTP/2 or returns a 500 on the OPTIONS request
  2. Notice that Knative will start failing the health checks and the pod will be killed

Additional Context

It is not within the Kubernetes spec that an application must support HTTP/2 or that it should expect an OPTIONS call to its health/liveness probes. Only GET is part of the contract, which the Queue Proxy does not follow.

I believe the logic is flawed in the queue proxy's HTTP probes here. https://github.com/knative/serving/blob/873602a410ce54db05d9fb6caab121e5824dbe41/pkg/queue/health/probe.go#L155

When an error occurs during the upgrade, maxProto should be set to 1 and Knative should stop trying to make HTTP/2 requests. Currently because of this line, HTTP/2 will be retried indefinitely and HTTP/1 will never be attempted.

dprotaso commented 1 month ago

I'm confused what's making HTTP2 requests? Knative healthchecks are HTTP/1

braunsonm commented 1 month ago

@dprotaso I can see requests being made from the queue-proxy to the user-container and attempting to upgrade to HTTP/2 during the readiness probes.

And the code I linked above I believe is the logic for the queue-proxy to perform the HTTP/2 upgrade for these probes. This happens when the feature gate for auto-detecting HTTP2 is set to true

dprotaso commented 1 month ago

oh interesting - i didn't realize this was added. h2c upgrade is deprecated https://datatracker.ietf.org/doc/html/rfc9113#section-3.1

We should probably just always be doing HTTP/1 unless the user has specified h2c OR we change the detection to use h2c prior knowledge

dprotaso commented 1 month ago

You don't have an example app where this breaks?

braunsonm commented 1 month ago

I agree that probes should have always been HTTP/1 to match what would be expected from Kubernetes. But if you want this to remain so you can tell if an app supports HTTP/2 or not, then I would suggest at least gracefully failing if the HTTP/2 check fails (fallback to HTTP/1).

Unfortunately I don't have a sample that I could share, but I think it should be reproducible if you just had an app that throws a 500 whenever an OPTIONS request is made (ie, the upgrade request)

skonto commented 1 month ago

Hi @braunsonm, thanks for reporting this.

This happens when the feature gate for auto-detecting HTTP2 is set to true

Would it work if you turn this off for now or is this something that fails in other scenarios?

braunsonm commented 1 month ago

Would it work if you turn this off for now or is this something that fails in other scenarios?

It does work if it is set to false, but that does mean other applications deployed on Knative can no longer benefit from HTTP/2 which is unfortunate.

skonto commented 1 month ago

but that does mean other applications deployed on Knative can no longer benefit from HTTP/2 which is unfortunate.

That autodetect feature was never completed. So if the app is using http2 you mean that QP is not going to use it with autodetect= off? What do you mean apps on Knative cant benefit from HTTP/2, could you elaborate?

braunsonm commented 1 month ago

What do you mean apps on Knative cant benefit from HTTP/2, could you elaborate?

I was under the impression that autodetecting HTTP2 feature was required for HTTP2 to be used between the activator and ksvc's. Is that not true?

skonto commented 1 month ago

This is has to do with probes here. We do support http2 without setting that auto-detect property which btw is not done as a feature (check our grpc tests for example). Also see here on what happens when you turn that on: https://github.com/knative/serving/blob/main/pkg/queue/readiness/probe.go#L233-L242. We only try the upgrade if maxProto = 0 see https://github.com/knative/serving/blob/main/pkg/queue/health/probe.go#L163 cc @dprotaso if has more to add for the background info of this feature

dprotaso commented 1 month ago

Right now to support HTTP2 requires people to set the containerPort name to be h2c.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: grpc-ping
  namespace: default
spec:
  template:
    spec:
      containers:
      - image: docker.io/{username}/grpc-ping-go
        ports:
          - name: h2c
            containerPort: 8080

The feature has an issue here https://github.com/knative/serving/issues/4283 - the idea is to detect the protocol without the labelling

braunsonm commented 1 month ago

I see. We use func which doesn't support naming the port so that's why the autodetection was going to be required for us.