mozilla / janus-plugin-sfu

Janus plugin to act as a kind of SFU for game networking data.
Mozilla Public License 2.0
135 stars 40 forks source link

End old publisher session sooner when publisher reconnect #73

Closed vincentfretin closed 3 years ago

vincentfretin commented 3 years ago

I create this PR in draft so this has some visibility, if anyone is interested by this change. This is what I have currently in production. See discussion starting at https://github.com/mozilla/janus-plugin-sfu/issues/62#issuecomment-770877418

If a user has a network failure and they reconnects, you could have with naf-janus-adapter the message "Janus still has previous session for this client. Reconnecting in 10s." three times. With this changes ending the old publisher session sooner (not waiting that janus detects the RTCPeerConnection is dead), the message "Janus still has previous session for this client. Reconnecting in 10s." only appears once, so the user really reconnect sooner so that the other participants can hear them again.

mqp commented 3 years ago

This makes sense to me in your case. In the general case, I feel it's a little suspicious because it gives anyone a way to "kick" you if they connect with your user ID. It would be less abusable if it refused to join instead.

I wonder why it takes Janus so long to notice that the connection is dead? Maybe the client can terminate the connection in some way that it can notice sooner?

vincentfretin commented 3 years ago

The client can't notify the server in this case, the network is dead client side for the case where they switched from Wi-Fi to 4g for example.

vincentfretin commented 3 years ago

You're absolutely right about abusing it and someone could kick me out this way without having a jwt with kick_users permission. I don't have jwt in place for now in my app but it's planned.

In my case, refuse to join is not an option in case of switching network, I can't let the user wait 30s to be able to join again the meeting.

Maybe the client can send via websocket that they is currently reconnecting and send the list of session or handle ids they had so we can end the sessions server side.

mqp commented 3 years ago

The client can't notify the server in this case, the network is dead client side for the case where they switched from Wi-Fi to 4g for example.

I know Janus has built-in keepalive functionality for this purpose, you can see the signalling messages going back and forth -- have you tried making it more aggressive?

On the server it's configured in janus.cfg: https://github.com/meetecho/janus-gateway/blob/master/conf/janus.jcfg.sample.in#L59

In the minijanus library it's configured here -- obviously the client one will need to stay more aggressive than the server one to avoid spurious timeouts: https://github.com/mozilla/minijanus.js/blob/master/minijanus.js#L77

mqp commented 3 years ago

(Of course, that doesn't mean that we shouldn't also do something like in this PR, but it might make your reconnection problems much less painful.)

vincentfretin commented 3 years ago

Ah! I didn't see this option in janus. Ok session are timed out after 60s by default. This is what I see effectively, I can have a socket timed out after 30s then 3 times 10s trying to reconnect, this equals 60s. :D Yeah I should try like session_timeout=30 and keepaliveMs=20000 for example or just keep keepaliveMs=30000 and set session_timeout=38, so it retries only once after 10s and I wouldn't need this PR at all.

vincentfretin commented 3 years ago

FYI, naf-janus-adapter is currently using timeoutMs: 30000 and keepaliveMs: 30000 because of:

this.session = new mj.JanusSession(this.ws.send.bind(this.ws), { timeoutMs: 30000 });

If I set session_timeout=38 in janus.jcfg, I have indeed only once the message "Janus still has previous session for this client. Reconnecting in 10s.". I even tested session_timeout=20, I don't seem to have any issue having it lower that keepaliveMs, in this case I don't have "Janus still has previous session for this client. Reconnecting in 10s." at all and it fully reconnects right after the websocket times out 30s. So the keepaliveMs is used here to keep alive the websocket, not the RTCPeerConnection as far I understand.

vincentfretin commented 3 years ago

I may have talked to soon :) I have some errors now

error received from keepalive:  
{janus: "error", session_id: 366457004017157, transaction: "28", error: {…}}
error: {code: 458, reason: "No such session 366457004017157"}
janus: "error"
session_id: 366457004017157
transaction: "28"

so I think you're right keepaliveMs should be lower than session_timeout.

vincentfretin commented 3 years ago

I think it should be timeoutMs > session_timeout > keepaliveMs The following timeoutMs: 40000 session_timeout = 38 keepaliveMs: 30000 seems to give a perfect reconnect with audio after 40s when a message in the websocket timed out when network changed. I think I'll use that now.

vincentfretin commented 3 years ago

I changed timeoutMs to 40000 in my naf-janus-adapter 3.0.x branch https://github.com/Synantoo/naf-janus-adapter/commit/261e1ca6040b74c4b90ba62762e0847ab8b76761