mozilla / janus-plugin-sfu

Janus plugin to act as a kind of SFU for game networking data.
Mozilla Public License 2.0
135 stars 40 forks source link

Prometheus instrumentation to get metrics of the switchboard #96

Open vincentfretin opened 2 years ago

vincentfretin commented 2 years ago

Add prometheus instrumentation with crates prometheus and prometheus-static-metric to get details of the switchboard like number of rooms, users, sessions to be able to vizualize that with Grafana. The /metrics endpoint can be added with tide / async_std (this is what janus-conference use) or axum / tokio (see also comments on #93)

An agones Rust SDK integration may be interesting (or not I didn't dig this much yet) to scale horizontally janus instances on kubernetes. The Rust SDK is using tokio as a dependency already, and you get prometheus metrics too.

vincentfretin commented 2 years ago

I found out about agones via the XREngine project. To know more about agones, see the first video on https://agones.dev/site/docs/third-party-content/videos-and-presentations/ Also documentation: https://agones.dev/site/docs/guides/metrics/ https://agones.dev/site/docs/guides/client-sdks/rust/

vincentfretin commented 2 years ago

To be clear, for agones, this is mainly to scale the number of rooms you can support. You still need a separate backend to know that a given room is hosted to a specific janus instance, so when a user wants to connect to a room, you point them to the right janus instance. Agones here would be useful to know on which janus instance you can create a room, or if you need to spawn a new janus instance, provisioning a new node on the k8s cluster eventually. From what I understand, Hubs is doing something similar having a fixed number of aws instances, they get the aws instances through the Habitat API and for each instance, it gets the mediasoup stats to know the number of sessions on each instance and then decide to use a given aws instance for a room and save the selected host to postgres.

vincentfretin commented 2 years ago

I know you can get the number of sessions of the janus instance via the admin REST API already and it's indeed this metrics you want to use to know the load of a janus instance. And you also probably push some other janus metrics via an event handler plugin but I didn't look much on this side yet. So the prometheus metrics endpoint I describe here would be useful just to get the number of rooms and users really and to have a beautiful graph of it. :)