typokign / matrix-chart

Helm chart for deploying a Matrix homeserver stack
MIT License
89 stars 48 forks source link

Out of the Box CoTURN setup seems unlikely to work well #33

Open Routhinator opened 4 years ago

Routhinator commented 4 years ago

Now that I've gotten Riot and Synapse working an stable I'm turning my attention to bridges and audio/video.

Looking at CoTURN docs and this setup, I think the OOTB config the chart is using right now needs tweaking. For starters I believe each CoTURN server (or pod in this case) needs it's own external IP and they aren't supposed to be behind a loadbalancer. Another challenge is it seems only about 50% of cloud providers will provide UDP port support on a loadbalancer.

I think we either need to put coturn into TCP mode or allow the array of URIs to be able to be specified so a DNS record can be pointed to each kube node.

{{- if .Values.coturn.enabled }}
## TURN ##

# The public URIs of the TURN server to give to clients

turn_uris:
  - "turn:{{ include "matrix.hostname" . }}?transport=udp"

# The shared secret used to compute passwords for the TURN server

turn_shared_secret: {{ include "matrix.coturn.sharedSecret" . }}

# How long generated TURN credentials last

turn_user_lifetime: 1h

# Whether guests should be allowed to use the TURN server.
# This defaults to True, otherwise VoIP will be unreliable for guests.
# However, it does introduce a slight security risk as it allows users to
# connect to arbitrary endpoints without having first signed up for a
# valid account (e.g. by passing a CAPTCHA).

turn_allow_guests: {{ .Values.coturn.allowGuests }}
{{- end }}
Routhinator commented 4 years ago

Researching CoTURN more, I'm thinking that given the current DaemonSet replicaset, a round-robin record setup would be best, and that requires a configurable value. The second option would be to use the ATLERNATE-SERVER scheme, which requires one pod to answer initial requests and distribute them to the others. This second method may be better for Deployment replicasets in large clusters where one does not want coturn pods on every node, though working out the exact method may take some playing with.

These thoughts are based on the docs linked and quoted below, which seem to imply that putting coturn behind an upstream LB has limitations: https://github.com/coturn/coturn/wiki/TURN-Performance-and-Load-Balance

You have three options here:

Set a networking load-balancing equipment that redirects the requests to a member of the TURN servers group. In general case, it must take care about redirecting the requests to the same server from the same client IP - because some TURN sessions from the same client must share the information. There are two cases when different TURN sessions must interact: RTP/RTCP connection pairs (from RFC 5766) and TCP relay (from RFC 6062). If you are not using those features then a simple network load balancing is enough. If you do use those features, then you have to map the whole client IP (with all its network ports) o the same TURN server. Also, if you are using the mobile TURN (from the new MICE draft) then you cannot use the network load balancer option because client sessions from different IP addresses must interact - so you have to use the next option (see below).

Set a less complex scheme with round-robin DNS. The client must send all its requests to the same DNS-discovered TURN server. That scheme supports all use cases.

Use build-in balancing capability with ALTERNATE-SERVER option (--alternate-server options). In this case, the client must also send all requests to the same alternate-server address. You set a single system as the "front-end" of the cluster of TURN servers, and that "load banacer" system does nothing - it just returns 300 ALTERNATE-SERVER error to all clients, with an alternate server IP address, so the client will re-connect to another server. If the alternate servers are chosen in a round-robin manner, then you have a load-balancing cluster of TURN servers.

typokign commented 4 years ago

Have you read the notes I left in values.yaml by chance? You're right, installing Coturn is a big pain point at the moment, but I've included two possible strategies, one of which should work.

  # How to deploy Coturn
  # Options:
  #   DaemonSet:  A DaemonSet will be used to schedule one Coturn pod per node. Each Coturn pod will open the ports it needs directly on the host it is scheduled on.
  #               This maximizes compatibility and will allow you to set up Coturn without any additional cluster configuration.
  #   Deployment: A Deployment will be used to schedule Coturn pods. The number of Coturn pods will be configurable (via the replicaCount setting below).
  #               You will need to use a NodePort service or an external load balancer to route traffic to the Coturn pods.
  #               This is more flexible and can use fewer pods in a multi-node setup, but will require additional networking configuration.

---

    # The type of service to deploy for routing Coturn traffic
    # Options:
    #   ClusterIP: Recommended for DaemonSet configurations. This will create a standard Kubernetes service for Coturn within the cluster. No external networking
    #              will be configured as the DaemonSet will handle binding to each Node's host networking
    #   NodePort:  Recommended for Deployment configurations. This will open TURN ports on every node and route traffic on these ports to the Coturn pods.
    #              You will need to make sure your cloud provider supports the cluster config setting "apiserver.service-node-port-range", as this range must contain
    #              the ports defined above for the service to be created.

The gist of it is this: since we can't run an external load balancer, the simpleset setup I envision is binding to ports on the nodes themselves.

The two strategies for doing this are:

Then, once you have Coturn serving traffic on one or more nodes, set up a DNS round-robin record to the public IPs of each of your nodes.

I think we either need to put coturn into TCP mode

I'd be interested to hear more about this. I thought TURN (or at least Matrix's usage of TURN for voip calls) had to run over UDP?

Routhinator commented 4 years ago

Their docs imply that there is a TCP mode for coturn, but you might be right about Matrix, I am still combing through docs for both things.

Currently though, the configuration in matrix is hardcoded to use the DNS name of the matrix server, and I need to point it to a round-robin dns record for the nodeport as I cannot use the dns record that is pointed to my loadbalancer:

turn_uris:
  - "turn:{{ include "matrix.hostname" . }}?transport=udp"
typokign commented 4 years ago

Ah, fair enough, fixed in 95d0547139b4289accab98a3299d105aa26d54d8 and version 2.2.0 (which I just pushed and includes Synapse 1.15)

typokign commented 4 years ago

This is interesting: https://matrix.org/docs/spec/client_server/r0.6.1#voice-over-ip, section 13.3.3.1, the sample response includes TURN servers with ?transport=tcp. Will ask around and see if we can get this to run over TCP only

danjenkins commented 4 years ago

So the media will still go via UDP bit the ice candidate gathering will happen on tcp

Arkaniad commented 4 years ago

I was struggling with this and have made some changes in https://github.com/dacruz21/matrix-chart/pull/41. I also uncovered an issue with the CoTURN shared secret that I've logged in https://github.com/dacruz21/matrix-chart/issues/42. After these changes I was able to get CoTURN working properly on DigitalOcean Kubernetes Service using DaemonSet and ClusterIP and creating round-robin DNS records for turn services.