mattermost / mattermost-plugin-calls

https://www.mattermost.com
Other
96 stars 50 forks source link

Reconnect/Rejoin to the call after changing IP address (of client) #387

Open botinca opened 1 year ago

botinca commented 1 year ago

Original bugreport: https://github.com/mattermost/mattermost-server/issues/22732

Summary

Client is unable to re-join to exist call (or start the new one) after changing IP address. For example, after lost of WiFi connection (and switching to mobile Internet) or connect to another WiFi network and similiar cases.

Steps to reproduce

Start or join into a call, then turn off wifi connection on your cellural phone (a) or connect/disconnect VPN. Your call will be interrupted and you will be unable to reconnect into the call. Sometimes for several minutes, but sometimes for a longer time. Server log posted below

Expected behavior

I understand that implementation of some kind of "roaming" might be quite difficult, so I just expect quite simple possibility to quick rejoin into the call.

Observed behavior (that appears unintentional)

{"timestamp":"2023-03-29 22:22:10.106 +03:00","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:156","method":"GET","url":"/api/v4/system/ping","request_id":"bwgink1c3iy73kzzkmb9wq55sr","status_code":"200"}
{"timestamp":"2023-03-29 22:22:11.700 +03:00","level":"info","msg":"pc ERROR: 2023/03/29 22:22:11 Incoming unhandled RTP ssrc(1801543096), OnTrack will not be fired. mid RTP Extensions required for Simulcast\n","caller":"io/io.go:428","plugin_id":"com.mattermost.calls","source":"plugin_stdout"}
{"timestamp":"2023-03-29 22:22:11.723 +03:00","level":"debug","msg":"Got remote track!!!","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*logger).Debug log.go:80"}
{"timestamp":"2023-03-29 22:22:11.723 +03:00","level":"debug","msg":"{MimeType:audio/opus ClockRate:48000 Channels:2 SDPFmtpLine:minptime=10;useinbandfec=1 RTCPFeedback:[]}","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*logger).Debug log.go:80"}
{"timestamp":"2023-03-29 22:22:11.724 +03:00","level":"debug","msg":"Track has started, of type 111: audio/opus","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*logger).Debug log.go:80"}
{"timestamp":"2023-03-29 22:22:23.845 +03:00","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:156","method":"GET","url":"/api/v4/users/me/sessions","request_id":"e88n8mrs5jrqjnhwn9mqcqhagr","status_code":"200"}
{"timestamp":"2023-03-29 22:22:27.200 +03:00","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:156","method":"POST","url":"/api/v4/users/status/ids","request_id":"x6ymu9crhbbpxyngrydbpobjde","status_code":"200"}
{"timestamp":"2023-03-29 22:22:39.707 +03:00","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:156","method":"GET","url":"/api/v4/users/me/sessions","request_id":"a6shsxr3a7ddubnkggf31ogtya","status_code":"200"}
{"timestamp":"2023-03-29 22:22:40.204 +03:00","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:156","method":"GET","url":"/api/v4/system/ping","request_id":"os1jg55idf8jbeenjs4bztxpno","status_code":"200"}
{"timestamp":"2023-03-29 22:22:46.571 +03:00","level":"debug","msg":"ice disconnected","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*logger).Debug log.go:80","sessionID":"4ukhq6btm78h7c6x61axptgpuy"}
{"timestamp":"2023-03-29 22:22:46.572 +03:00","level":"debug","msg":"peer connection disconnected","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*logger).Debug log.go:80","sessionID":"4ukhq6btm78h7c6x61axptgpuy"}
{"timestamp":"2023-03-29 22:22:52.610 +03:00","level":"debug","msg":"websocket.NextReader: closing websocket","caller":"app/web_conn.go:828","user_id":"ra5co8c4spfn5kofnnxbzyqhhr","error":"websocket: close 1006 (abnormal closure): unexpected EOF"}
{"timestamp":"2023-03-29 22:22:52.610 +03:00","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:156","method":"GET","url":"/api/v4/websocket","request_id":"wz9igw94y7yijea19bqcw3j15o"}
{"timestamp":"2023-03-29 22:22:57.399 +03:00","level":"debug","msg":"websocket.NextReader: client side closed socket","caller":"app/web_conn.go:822","user_id":"ra5co8c4spfn5kofnnxbzyqhhr"}
{"timestamp":"2023-03-29 22:22:57.399 +03:00","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:156","method":"GET","url":"/api/v4/websocket","request_id":"nfrncb1ifbb9d8rkdtsfiw3o4e"}
{"timestamp":"2023-03-29 22:22:59.123 +03:00","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:156","method":"GET","url":"/api/v4/plugins/webapp","request_id":"4dfydiocrifuipfrneudywptch","status_code":"200"}
{"timestamp":"2023-03-29 22:22:59.231 +03:00","level":"debug","msg":"handleJoin","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*Plugin).handleJoin websocket.go:334","userID":"ra5co8c4spfn5kofnnxbzyqhhr","connID":"8zb6nj3dztgjmn4jdnbnt57zbc","channelID":"fxdwmmqnp3fnjrpxqrz69pg9kh"}
{"timestamp":"2023-03-29 22:22:59.235 +03:00","level":"error","msg":"failed to add user session: callback failed: user is already connected","caller":"app/plugin_api.go:940","plugin_id":"com.mattermost.calls","origin":"main.(*Plugin).WebSocketMessageHasBeenPosted.func1 websocket.go:533","userID":"ra5co8c4spfn5kofnnxbzyqhhr","connID":"8zb6nj3dztgjmn4jdnbnt57zbc","channelID":"fxdwmmqnp3fnjrpxqrz69pg9kh"}
{"timestamp":"2023-03-29 22:23:00.381 +03:00","level":"debug","msg":"websocket.NextReader: closing websocket","caller":"app/web_conn.go:828","user_id":"ra5co8c4spfn5kofnnxbzyqhhr","error":"websocket: close 1006 (abnormal closure): unexpected EOF"}
{"timestamp":"2023-03-29 22:23:00.381 +03:00","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:156","method":"GET","url":"/api/v4/websocket","request_id":"hb5gct6kefrujkhuc54ptepebc"}
{"timestamp":"2023-03-29 22:23:00.382 +03:00","level":"debug","msg":"closing channel for session","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*Plugin).OnWebSocketDisconnect.func1 websocket.go:271","userID":"ra5co8c4spfn5kofnnxbzyqhhr","connID":"4ukhq6btm78h7c6x61axptgpuy","channelID":"fxdwmmqnp3fnjrpxqrz69pg9kh"}
{"timestamp":"2023-03-29 22:23:00.383 +03:00","level":"debug","msg":"done","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*Plugin).handleJoin websocket.go:477","userID":"ra5co8c4spfn5kofnnxbzyqhhr","connID":"4ukhq6btm78h7c6x61axptgpuy","channelID":"fxdwmmqnp3fnjrpxqrz69pg9kh"}
{"timestamp":"2023-03-29 22:23:00.384 +03:00","level":"error","msg":"failed to read RTP packet","caller":"app/plugin_api.go:940","plugin_id":"com.mattermost.calls","origin":"main.(*logger).Error log.go:92","error":"EOF"}
{"timestamp":"2023-03-29 22:23:00.385 +03:00","level":"debug","msg":"ice closed","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*logger).Debug log.go:80","sessionID":"4ukhq6btm78h7c6x61axptgpuy"}
{"timestamp":"2023-03-29 22:23:00.385 +03:00","level":"debug","msg":"removing session from state","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*Plugin).handleJoin.func2 websocket.go:386","userID":"ra5co8c4spfn5kofnnxbzyqhhr"}
{"timestamp":"2023-03-29 22:23:00.387 +03:00","level":"debug","msg":"peer connection closed","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*logger).Debug log.go:80","sessionID":"4ukhq6btm78h7c6x61axptgpuy"}
{"timestamp":"2023-03-29 22:23:00.422 +03:00","level":"debug","msg":"done, removing session","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*Plugin).OnWebSocketDisconnect.func1 websocket.go:274","userID":"ra5co8c4spfn5kofnnxbzyqhhr","connID":"4ukhq6btm78h7c6x61axptgpuy","channelID":"fxdwmmqnp3fnjrpxqrz69pg9kh"}
{"timestamp":"2023-03-29 22:23:04.790 +03:00","level":"debug","msg":"websocket.NextReader: client side closed socket","caller":"app/web_conn.go:822","user_id":"ra5co8c4spfn5kofnnxbzyqhhr"}
{"timestamp":"2023-03-29 22:23:04.790 +03:00","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:156","method":"GET","url":"/api/v4/websocket","request_id":"x1p79frcabgbzdzhop65io6esr"}
{"timestamp":"2023-03-29 22:23:04.791 +03:00","level":"debug","msg":"closing channel for session","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*Plugin).OnWebSocketDisconnect.func1 websocket.go:271","userID":"ra5co8c4spfn5kofnnxbzyqhhr","connID":"8zb6nj3dztgjmn4jdnbnt57zbc","channelID":"fxdwmmqnp3fnjrpxqrz69pg9kh"}
{"timestamp":"2023-03-29 22:23:04.792 +03:00","level":"debug","msg":"done, removing session","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*Plugin).OnWebSocketDisconnect.func1 websocket.go:274","userID":"ra5co8c4spfn5kofnnxbzyqhhr","connID":"8zb6nj3dztgjmn4jdnbnt57zbc","channelID":"fxdwmmqnp3fnjrpxqrz69pg9kh"}

Possible fixes

I don't know all the details how it works, but there is maybe some timeouts to early detection of disconnection and we can decrease its valut. Or, maybe, more often heartbeat packets and so on...

streamer45 commented 1 year ago

I understand that implementation of some kind of "roaming" might be quite difficult

I think this is actually supported at least partially and it's actually the reason we are not clearing the state as soon as a disconnect fires. Looking at the logs it's taking roughly 15 seconds from the first detection of disconnection:

{"timestamp":"2023-03-29 22:22:46.572 +03:00","level":"debug","msg":"peer connection disconnected","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*logger).Debug log.go:80","sessionID":"4ukhq6btm78h7c6x61axptgpuy"}

to clearing the call state:

{"timestamp":"2023-03-29 22:23:00.385 +03:00","level":"debug","msg":"removing session from state","caller":"app/plugin_api.go:934","plugin_id":"com.mattermost.calls","origin":"main.(*Plugin).handleJoin.func2 websocket.go:386","userID":"ra5co8c4spfn5kofnnxbzyqhhr"}

This is because we are not clearing the session as soon as we detect disconnection since it's not a fatal condition, the client could reconnect (e.g. roaming case). But if it fails to do so after some time, then the connection fails at which point we clear everything.

We could lower the timeouts to somewhat mitigate this but I don't think that's the best solution. What we should do is support multiple user sessions. We have had some plans around this but never implemented the functionality. I created an internal ticket to track this work (https://mattermost.atlassian.net/browse/MM-51852).