l7mp / stunner

A Kubernetes media gateway for WebRTC. Contact: info@l7mp.io
https://l7mp.io
MIT License
709 stars 56 forks source link

Milestone v1.14: Performance: Per-allocation CPU load-balancing #60

Closed rg0now closed 11 months ago

rg0now commented 1 year ago

This issue is to plan & discuss the performance optimizations that should go into v1.14.

Problem: Currently STUNner UDP performance is limited at about 100-200 kpps per UDP listener (i.e., per UDP Gateway/listener in the Kubernetes Gateway API terminology). This is because we allocate a single net.PacketConn per UDP listener, which is then drained by a single CPU thread/go-routine. This means that all client allocations made via that listener will share the same CPU thread and there is no way to load-balance client allocations across CPUs; i.e., each listener is restricted to a single CPU. If STUNner is exposed via a single UDP listener (the most common setting) then it will be restricted to about 1200-1500 mcore.

Notes:

Solution: The plan is to create a separate net.Conn for each UDP allocation, by (1) sharing the same listener server address using REUSEADDR/REUSEPORT, (2) connecting each per-allocation connection back to the client (this will turn the net.PacketConn into a connected net.Conn), and (3) firing up a separate read-loop/go-routine per each allocation/socket. Extreme care must be taken though in implementing this: if we blindly create a new socket per received UDP packet then a simple UDP portscan will DoS the TURN listener.

Plan:

  1. Move the creation of per-allocation connection creation after the client has authenticated with the server, e.g., when the TURN allocation request has been successfully processed. Note that this still allows a client with a valid credential to DoS the server, so we need to quota per-client connections.

  2. Implement per-client quotas as per RFC8656, Section 7.2., "Receiving an Allocate Request", point 10:

At any point, the server MAY choose to reject the request with a 486 (Allocation Quota Reached) error if it feels the client is trying to exceed some locally defined allocation quota. The server is free to define this allocation quota any way it wishes, but it SHOULD define it based on the username used to authenticate the request and not on the client's transport address.

  1. Expose the client quota via turn.ServerConfig. Possibly also expose a setting to let users to opt in to per-allocation CPU load-balancing.

  2. Test and upstream.

Feedback appreciated.