Question: Route based on host?

davhdavh commented 1 year ago

I have tried searching the documentation, but it seems I cannot find information about how to have multiple distinct backends on the same frontend?

ie.

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
  name: udp-gateway
  namespace: default
spec:
  gatewayClassName: stunner-gatewayclass
  listeners:
    - name: udp-listener
      port: 3478
      protocol: UDP

and then n backend services:

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: UDPRoute
metadata:
  name: iperf-server
  namespace: team1
spec:
  parentRefs:
    - name: udp-gateway
      namespace: default
  rules:
    - backendRefs:
        - name: iperf-server
          namespace: team1

and

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: UDPRoute
metadata:
  name: iperf-server
  namespace: team2
spec:
  parentRefs:
    - name: udp-gateway
      namespace: default
  rules:
    - backendRefs:
        - name: iperf-server
          namespace: team2

I don't really see anything in the gateway spec that would allow for this, and no attributes in stunner that would identify if traffic is sent to team1 or team2s iperf-server.

rg0now commented 1 year ago

Duplicate of #90

rg0now commented 1 year ago

Unfortunately, you're affected by the same problem as in #90 : cross-namespace Route bindings are not correctly implemented in STUNner (this is documented here). This was a simplification that made our life easier for the first couple of releases, but it is time to remove this subtlety for the better. We are actively working on fixing this, the progress is tracked here.

Until then, it's best to just put all Gateways and UDPRoutes into the same namespace and everything should work fine.

rg0now commented 1 year ago

To actually answer your question: how you're trying to do it is exactly how it should be done, but until this far this has not been not possible due to a STUNner limitation that blocked UDPRoutes to attach to Gateways across a namespace boundary.

This should now fixed as of e770d05 in the gateway-operator repo. Currently this change is only available on the dev release channel from the stunner/stunner-gateway-operator-dev chart:

helm install stunner-gateway-operator stunner/stunner-gateway-operator-dev --create-namespace --namespace=stunner-system

The below worked for me perfectly:

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
  name: udp-gateway
  namespace: default
spec:
  gatewayClassName: stunner-gatewayclass
  listeners:
    - name: udp-listener
      port: 3478
      protocol: UDP
      allowedRoutes:
        namespaces:
          from: All
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: UDPRoute
metadata:
  name: iperf-server
  namespace: team1
spec:
  parentRefs:
    - name: udp-gateway
      namespace: default
  rules:
    - backendRefs:
        - name: iperf-server
          namespace: team1
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: UDPRoute
metadata:
  name: iperf-server
  namespace: team2
spec:
  parentRefs:
    - name: udp-gateway
      namespace: default
  rules:
    - backendRefs:
        - name: iperf-server
          namespace: team2

Note the allowedRoutes config in the Gateway, this makes sure that your Gateway accepts routes from all namespaces.

Can you please report back on your findings? Thanks!

davhdavh commented 1 year ago

I can see I was a bit unclear. The problem is not the namespace issue. Thanks for the fix though, looking forward to .16. The problem is how does it know WHICH traffic to send to which backend? team1 and team2 is running different apps and thus need to work totally independent of each other. I could allocate a different port to each, but since there is no way to auto allocate a port, I would need to manually keep track of which is which. Also the authentication is in the GatewayConfig, but why would team1 and team2 not take care of their own credentials?

rg0now commented 1 year ago

This is the tricky part: the name "UDPRoute" is somewhat misleading because STUNner actually never routes traffic per se. This is due to the way TURN (and WebRTC media) works: the client must know the IP address of the peer (i.e., the Kubernetes pod, or the iperf-server in your case) beforehand and send it along to STUNner, in order to be able communicate with the peer, and the UDPRoute serves merely as a mechanism for STUNner to check whether a client tries to reach a pod it is permitted to talk to (like an ACL). Exchanging the client and peer IP addresses is usually handled in an out-of-band ICE conversation between the client and the peer.

To answer your question:

if you merely want to expose an iperf server (which I guess is not your use case) then you can use a vanilla Kubernetes Gateway API implementation like Envoy Gateway, which actually also routes traffic;
if your clients and servers are doing WebRTC then ICE will handle the selection of your peer address seamlessly: say, when a client wants to talk to the WebRTC workload of team1 then your application server (or whatever signaling component you use) just connects the client with team1's WebRTC servers into an ICE conversation and things will be magically settled by the client, the server and STUNner without you having to worry about media routing (and similarly for team2);
you can also install a separate Gateway per team, so the external IP will be different per each team and you don't need to play with the ports , but even in this case you have to make sure the client knows which peer IP (i.e., pod) it wants to connect to.

I'm not saying this is optimal, but for WebRTC media to actually work, this how it has to be done. If still in trouble, feel free to drop by at our Discord, happy to help you.

davhdavh commented 1 year ago

Thanks, I totally missed that trick in the TURN/STUN spec.

Just to be absolutely sure I got it:

outside -> WebRTC http traffic -> normal ingress -> service -> pod X -> return pod X IP and ICE config pointing to Gateway
Do ICE nego, which should end up with Gateway being the only option
outside audio traffic -> Gateway -> stunner service -> stunner pod -> match IP from TURN and IP of pods of backend refs of the combine all UDPRoute -> pod X
inside audio traffic from pod X -> Gateway -> stunner service -> stunner pod -> outside (or does pod X send to outside directly?)

and the only reason you need a UDPRoute to begin with is to ensure that whatever IP the outside tells you to send traffic to is acceptable?

That makes me wonder why we need the gateway to being with then? Wouldn't it be simpler to offload that to traefik, haproxy or nginx or something else that can already do the udp routing?

outside audio traffic -> udp ingress (traefik/haproxy/nginx) -> stunner service -> stunner pod -> match IP from TURN and IP of pods of config -> pod X Sure there is a step

rg0now commented 1 year ago

Yes, that's a fairly good description of what's going on.

inside audio traffic from pod X -> Gateway -> stunner service -> stunner pod -> outside (or does pod X send to outside directly?)

No, everything goes through STUNner (see below).

and the only reason you need a UDPRoute to begin with is to ensure that whatever IP the outside tells you to send traffic to is acceptable?

Exactly.

That makes me wonder why we need the gateway to being with then? Wouldn't it be simpler to offload that to traefik, haproxy or nginx or something else that can already do the udp routing?

The problem is NAT traversal: when the media server lives in a Kubernetes pod it is hosted on a private IP and there are several layers of network address translation happening between it and the client, which stops them from communicating over WebRTC (WebRTC media servers identify users/media per the source IP/port: if the source IP/port changes, the audio/video traffic is dropped). You need an ingress gateway that runs TURN on top of UDP (TURN is a glorified tunneling protocol that allows the connection to survive any layers of NATs) and can inject your media into the private Kubernetes container network without breaking WebRTC media connections, and currently only STUNner provides this functionality. (To be absolutely fair any TURN server will do, but STUNner is designed specifically for this purpose.)

Is this nice? No. Is this how it must be done today with WebRTC? Absolutely. I too dream of a world where we don't need all this complexity any more, but we have to wait until media over QUIC/WebTransport/HTTP3 becomes a thing. Until then, however, you'll need a TURN gateway to run your media servers in Kubernetes, and STUNner provides you just that.

sando38 commented 1 year ago

(To be absolutely fair any TURN server will do, but STUNner is designed specifically for this purpose.)

As I am reading this and out of curiosity: Is it possible to replace the pion TURN server in the data plane with another TURN server implementation?

As far as I understood the controller creates the relevant config parameters which the data plane TURN server adopts, no? So in theory, if another implementation could translate it to its configuration, it could work? Or is there any other magic I have not considered.

Thanks :+1:

rg0now commented 1 year ago

Is it possible to replace the pion TURN server in the data plane with another TURN server implementation?

At the moment our Kubernetes Gateway API operator emits STUNner specific config files, so if someone goes out and writes a STUNner configfile parser for that "another TURN server" then the answer is yes. (Or you can rewrite the operator to emit different config file formats as well.) There is one trick though: STUNner knows how to reconcile config file updates incrementally, that is, without restarting the TURN server and dropping active connections. If that "other TURN server" lacks this capability then it will be hardly usable in Kubernetes, where the config file has to be updated frequently: clearly, you don't want to drop active clients when, say, you add a new backend pod or update the load-balancer settings...

sando38 commented 1 year ago

At the moment our Kubernetes Gateway API operator emits STUNner specific config files, so if someone goes out and writes a STUNner configfile parser for that "another TURN server" then the answer is yes. (Or you can rewrite the operator to emit different config file formats as well.)

Yeah, that is what I meant with "if another implementation could translate it to its configuration".

There is one trick though: STUNner knows how to reconcile config file updates incrementally, that is, without restarting the TURN server and dropping active connections. If that "other TURN server" lacks this capability then it will be hardly usable in Kubernetes, where the config file has to be updated frequently: clearly, you don't want to drop active clients when, say, you add a new backend pod or update the load-balancer settings...

100% agree. Currently we maintain a TURN server called eturnal which supports config reloads without interrupting services. Actually, we work on supporting hot release upgrades as well. Maybe also interesting for your concept to test with one pod first before updating the entire helm release, etc.

Sorry for hijacking this thread now, would you consider to include eturnal in your operator as well:

in general?
- if you do not want to support it (at first), unofficially?
if we provide for example a PR?

Thanks and have a great day!

rg0now commented 9 months ago

Closing this for now. Feel free to reopen if anything new comes up.

l7mp / stunner

Question: Route based on host? #91