meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.25k stars 2.48k forks source link

[1.x] Streaming Plugin Mountpoint Fast Teardown/Setup Triggers "address already in use" Error #3111

Closed ctessarek closed 1 year ago

ctessarek commented 1 year ago

Hello everyone,

we experienced this on Janus 1.0.4 on Debian 11.5 aarch64, Linux 5.10.0 -- but I don't see why it shouln't be the same for 1.1.0 and current HEAD, since relevant code parts haven't changed I think. Older versions are (I guess) the same, but we aren't using the streaming plugin that long so we can't say.

We recognized a problem when in fast succession tearing down and creating streaming plugin mountpoints (specifying the same audioport) again. The mountpoint wouldn't be created and the following error(s) logged:

[ERR] [plugins/janus_streaming.c:janus_streaming_create_fd:6665] [(null)] Bind failed for audio (port 10358)... 98 (Address already in use)
[ERR] [plugins/janus_streaming.c:janus_streaming_create_rtp_source_stream:6855] [(null)] Can't bind to port 10358...
[ERR] [plugins/janus_streaming.c:janus_streaming_process_synchronous_request:3451] Can't add 'rtp' stream '(null)', error creating data source stream...

The attached patch fixes this problem for us and I think there should be no situations in which using it would have negative consequences. (I tested it on 1.0.4 and 1.1.0) If you see fit, maybe you can incorporate the patch.

All the best, Chris

janus-streaming-sock_reuse.patch.txt

lminiero commented 1 year ago

I'm not sure SO_REUSEADDR is needed here, since this is UDP and not TCP. It may actually be harmful if this ends up allowing two mountpoints to share the same port, which will happen when you configure an RTP port range.

ctessarek commented 1 year ago

You're right of course, that could be a problem. Function-wise, it now works for us with SO_REUSEADDR, while, without it, these errors occurred, that's all I can say. I experienced needing SO_REUSEADDR with UDP outside of janus in my own C programs handling UDP earlier. However that was also on Debian 10/11 x86_64 and Debian 11 aarch64) and I must admit I didn't check if that's the same on other Linux versions/distribution, maybe it's got something to do with that.

lminiero commented 1 year ago

With SO_REUSEADDR the same port would definitely be accessible to multiple mountpoints asking for it, which means they'd steal each other packets: as such, it's not an acceptable solution I'm afraid. Your current problems comes from the fact that mountpoint sockets are not closed when you send the "destroy" (or get the answer to that), but when the related free function is called on the mountpoint instance itself: since that depends on reference counting, it can happen a bit later. The only possible solution I see is adding an optional event sent when that "physically" happens, but that requires either an existing Janus API session (so not possible when using the Admin API to manage mountpoints) or event handlers configured.

ctessarek commented 1 year ago

Very very interesting, thank you very much for the explanation! :-) I'll look into it further, keeping that in mind.

lminiero commented 1 year ago

Thinking of a different option, we could move the sockets close to the end of the mountpoint relay thread (instead of the free), and make the "destroy" wait for the thread to end, but turning that request in a truly synchronous mechanism could add other problems of its own, since the thread could take time to end, and in case of issues it could take longer than the 10s timeout for requests.

lminiero commented 1 year ago

@ctessarek I've verified that a call to "destroy" already joins the thread, since it's done by janus_streaming_mountpoint_destroy which is automatically invoked once a mountpoint is removed from the hashtable (and that's what "destroy" does). As such, I added code to close the sockets in the "leaving thread" part. This should indeed solve your problem, as it will guarantee ports are free when you receive an answer to the "destroy". Please let me know if that's indeed the case, and if so I'll merge and close this issue.

ctessarek commented 1 year ago

Sorry for the late response and thank you @lminiero for adding this -- sadly, we can't reproduce the exact bug we had before with the old unpatched version any more to really verify the difference the patch makes, but it works great, thank you very much again! All the best, Chris