knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.54k stars 1.15k forks source link

Websockets do not work with domainmapings by default #13083

Open georgyo opened 2 years ago

georgyo commented 2 years ago

What version of Knative?

1.5

Expected Behavior

Websockets to work out of the box when using domain mappings. Especially since websockets work with the default endpoints that are created.

Actual Behavior

Websockets ONLY work with the default auto generated endpoint, and return a 503 when using a domain mapping.

Steps to Reproduce the Problem

This is very similar to https://github.com/knative/serving/issues/7933, however that focuses on GKE. The problem seems to exist for all installations. In my case, I am using RKE2 on vultr.

The exact service config I am deploying is here: https://gist.github.com/georgyo/846e72c94ed20b4d2988a2f164f31c4b/64742e75d9ea78b63a3af261784beb6e48e2ccaa

This creates two endpoints

They are pointing to the same service, and as such you would expect both to work or neither to work.

nak3 commented 2 years ago

Thank you for the report.

@georgyo Are you using net-kourier? If so, you can use DomainMapping with websockets by adding an annotation described in this docs https://github.com/knative-sandbox/net-kourier#tips (Please note that the it still does not work with NodePort as https://github.com/knative-sandbox/net-kourier/issues/821)

For other net-* plugins like net-istio, net-contour, I think it does not work yet.

(Note, this issue was tracked by https://github.com/knative/serving/issues/12601 but it was automatically closed.)

georgyo commented 2 years ago

I am using net-istio, and the I can confirm that the snippet in #12601 resolved the issue.

However, I am having trouble grasping why this helps. Specifically, why domainmappings break by default. IE, how/why does the domain mapping change how the request is routed?

In the case of wrtc-star.default.k.fu.io and https://wrtc-star.scalable.io/ both are going to the same istio ingress backed by the same knative service.

The Envoy filter allows http2 connections, but why were they only getting blocked when used a mapped domain?

dprotaso commented 2 years ago

/triage accepted

dprotaso commented 2 years ago

/area networking

mbaynton commented 1 year ago

Voicing support for this issue because I think it's reasonable to expect websockets to work when using net-istio and DomainMappings. That is all that's needed to reproduce.

@georgyo

IE, how/why does the domain mapping change how the request is routed?

I have found through experimentation and through looking at the istio VirtualServices that if you use a DomainMapping, you unfortunately add an additional proxy hop to the data path. I don't know the reason for this, it would certainly be nice if it was just another hostname on the same VirtualService that handled the default domains. But right now DomainMapping is implemented with the proxy backend being the cluster-local service. Requests for them get their Host header rewritten to [svc-name].svc.cluster.local and then envoy forwards them to envoy again. :/

dao-duc-tung commented 1 year ago

For the one who needs a solution on AWS EKS using:

Below is the envoy filter you need to apply:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: allowconnect-istio-ingressgateway
  namespace: istio-system
spec:
  workloadSelector:
    labels:
      app: istio-ingressgateway
  configPatches:
  - applyTo: NETWORK_FILTER
    match:
      listener:
        portNumber: 8081
        filterChain:
          filter:
            name: "envoy.filters.network.http_connection_manager"
    patch:
      operation: MERGE
      value:
        typed_config:
          "@type": "type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager"
          http2_protocol_options:
            allow_connect: true

The snippet above is similar to this https://github.com/knative/serving/issues/7933#issuecomment-786139169. I modified the filter name and typed config to the latest version.

This solution works for both ws and wss when I use cert-manager v1.10.0 to enable auto-TLS.