meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.25k stars 2.48k forks source link

Configurable cap to number of queued events when reconnecting WebSocket event handler #3148

Closed lminiero closed 1 year ago

lminiero commented 1 year ago

This PR was sponsored with love by our friends at @qxip! :heart:

By default, all our event handlers have an internal queue for dispatching events to external recipients: this is to ensure the core is not slowed down by I/O in the handlers when enqueueing generated events. Besides, out of the box all handlers do their best to ensure no event is ever lost: this means that, in case a recipient is temporarily unavailable, events that are generated in the meanwhile are queued and never dropped, in order to make sure that when the recipient comes back they have a complete view of what happened while they were offline.

While this is the intended behaviour we wanted them to have, this can cause issues if a recipient going away never comes back, or only comes back after a long time: in fact, this results in queued events to be stored forever and never freed, which on machines with limited or insufficient RAM will lead to OOM crashes.

As such, this PR adds a new configurable behaviour, that allows you to put a cap on the number of events that should be kept in queue in case a recipient goes away: in case you set a cap of N and the recipient disconnects, the handler will get rid of all older events until there's only N in queue, and will keep doing the same as new events are notified by the core. Notice that this does mean that the handler will, under such circumstances, drop some events, meaning the recipient will never see them: this will likely result in some limited visibility on some events (e.g., the recipient may see RTCP events for a handle they never saw being attached). The feature is disabled by default (events_cap_on_reconnect = 0), meaning that out of the box the event handler will work as it did before: it's up to you to specify a positive integer (e.g., events_cap_on_reconnect = 10) to enable the feature. Whether it's enabled or not, the cap is not effective when we do have a connection available (no drop will ever happen in that case).

Notice that this PR currently only adds this configurable behaviour to the WebSocket event handler, since to our knowledge it's the most commonly used one: besides, it's the one where a loss of connection is more apparent, since there's a persistent connection we control and manage ourselves. Should this effort be merged, we'll start thinking about how to do the same in other event handlers too (in case, do let us know which event handler you're running in production for feedback on that).

Thanks again to @qxip for sponsoring the development of this feature!

lminiero commented 1 year ago

Merging :muscle:

lminiero commented 1 year ago

Backported the same enhancement to 0.x too.