bkempe commented 5 years ago

Description

We're testing the behavior of a JVB connection interruption (no XMPP interruption) and its effect on the Jitsi Meet frontend.

Current behavior

A JVB connection interruption of a few seconds leads to the following JS Console and the BridgeChannel stays closed / does not become available anymore:

Logger.js:125 [modules/xmpp/JingleSessionPC.js] <A.peerconnection.oniceconnectionstatechange>:  (TIME) ICE disconnected P2P? false:  62671.080000000075
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <A.peerconnection.oniceconnectionstatechange>:  (TIME) ICE failed P2P? false:    72666.63500000004
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <A.peerconnection.oniceconnectionstatechange>:  (TIME) ICE disconnected P2P? false:  73003.99999999991
Logger.js:125 [modules/connectivity/IceFailedNotification.js] <>:  Sending ICE failed - the connection has not recovered
Logger.js:125 [JitsiMeetJS.js] <Object.getGlobalOnErrorHandler>:  UnhandledError: null Script: null Line: null Column: null StackTrace:  Error: Jingle error: {"reason":"timeout","session":"JingleSessionPC[p2p=false,initiator=false,sid=8jrjnt7283eae]"}
    at JingleSessionPC.js:2184
    at r.TimedHandler.handler (strophe.js:3405)
    at r.TimedHandler.run (strophe.js:2618)
    at r.Connection._onIdle (strophe.js:4435)
    at r.Connection.<anonymous> (strophe.js:5793)
i @ Logger.js:125
getGlobalOnErrorHandler @ JitsiMeetJS.js:547
window.onerror @ middleware.js:100
callErrorHandler @ GlobalOnErrorHandler.js:61
(anonymous) @ JingleSessionPC.js:2183
(anonymous) @ strophe.js:3405
run @ strophe.js:2618
_onIdle @ strophe.js:4435
(anonymous) @ strophe.js:5793
setTimeout (async)
_send @ strophe.js:5792
send @ strophe.js:3270
sendIQ @ strophe.js:3411
value @ JingleSessionPC.js:653
(anonymous) @ IceFailedNotification.js:44
setTimeout (async)
(anonymous) @ IceFailedNotification.js:39
Promise.then (async)
value @ IceFailedNotification.js:32
X._onIceConnectionFailed @ JitsiConference.js:2437
(anonymous) @ JitsiConferenceEventManager.js:192
r.emit @ events.js:96
peerconnection.oniceconnectionstatechange @ JingleSessionPC.js:470
A.peerconnection.oniceconnectionstatechange @ TraceablePeerConnection.js:253
Logger.js:125 [modules/xmpp/strophe.jingle.js] <s.value>:  on jingle transport-replace from ab283334-8e6b-46cf-9db6-b7c6800a8bef@conference.***/focus <iq xmlns="jabber:client" to="395d0043-9ebc-4022-9c54-55963a322378@***/FuBfJ0PE" type="set" id="Mzk1ZDAwNDMtOWViYy00MDIyLTljNTQtNTU5NjNhMzIyMzc4QHN0YWcudm93ZWwuYWkvRnVCZkowUEUAdmpsbXctMTIzNjEA+jD2aZ5ebHZ7xUfavN04TQ==" from="ab283334-8e6b-46cf-9db6-b7c6800a8bef@conference.***/focus">…</iq>
Logger.js:125 [modules/xmpp/strophe.jingle.js] <s.value>:  (TIME) Start transport replace 75215.5499999999
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <t.value>:  Renegotiate: setting remote description
index.js:146 Halt: There are no SSRC groups in the remote description.
Logger.js:125 [modules/RTC/BridgeChannel.js] <RTCDataChannel.e.onclose>:  Channel closed by client
Logger.js:125 [modules/RTC/TraceablePeerConnection.js] <A._remoteStreamRemoved>:  Ignored remote 'stream removed' event for non-user stream mixedmslabel
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <>:  Renegotiate: creating answer
Logger.js:125 [modules/xmpp/SdpConsistency.js] <e.value>:  TPC[1,p2p:false] no SSRC found for the recvonly videostream!
Logger.js:125 [modules/xmpp/RtxModifier.js] <e.value>:  RtxModifier doing nothing, video m line is recvonly
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <>:  Renegotiate: setting local description
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <t.value>:  removal not necessary
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <t.value>:  addition not necessary
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <t.value>:  Renegotiate: setting remote description
index.js:146 Halt: There are no SSRC groups in the remote description.
Logger.js:125 [modules/RTC/TraceablePeerConnection.js] <A._remoteStreamAdded>:  TPC[1,p2p:false] ignored remote 'stream added' event for non-user streamid: mixedmslabel
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <t.value>:  sendIceCandidate: last candidate.
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <>:  Renegotiate: creating answer
Logger.js:125 [modules/xmpp/SdpConsistency.js] <e.value>:  TPC[1,p2p:false] no SSRC found for the recvonly videostream!
Logger.js:125 [modules/xmpp/RtxModifier.js] <e.value>:  RtxModifier doing nothing, video m line is recvonly
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <>:  Renegotiate: setting local description
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <t.value>:  removal not necessary
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <t.value>:  addition not necessary
Logger.js:125 [modules/xmpp/SDPUtil.js] <Object.candidateToJingle>:  not translating "network-id" = "1"
Logger.js:125 [modules/xmpp/SDPUtil.js] <Object.candidateToJingle>:  not translating "network-cost" = "10"
Logger.js:125 [modules/xmpp/JingleSessionPC.js] <t.value>:  Sending transport-accept:  <iq to="ab283334-8e6b-46cf-9db6-b7c6800a8bef@conference.***/focus" type="set" xmlns="jabber:client" id="97ae7c99-a7f6-4a97-a00f-54910f49adea:sendIQ">…</iq>
Logger.js:125 [modules/RTC/BridgeChannel.js] <new e>:  constructor() with peerconnection
2Logger.js:125 [modules/xmpp/JingleSessionPC.js] <t.value>:  sendIceCandidate: last candidate.
Logger.js:125 [modules/xmpp/strophe.jingle.js] <>:  (TIME) Transport replace success! 75404.76499999977
7Logger.js:125 [modules/RTC/BridgeChannel.js] <e.value>:  Bridge Channel send: no opened channel.

Expected Behavior

BridgeChannel reconnects

Possible Solution

There is retry logic in the BridgeChannel.js but only for websocket. Why is that?

Steps to reproduce

Disable the JVB connection via

iptables -I INPUT -p udp  --dport ${JVB_PORT} -j DROP

on the JVB instance, then opening the port again after some time via

iptables -I INPUT -p udp  --dport ${JVB_PORT} -j ACCEPT

Environment details

Latest Jitsi

bgrozev commented 5 years ago

There is retry logic in the BridgeChannel.js but only for websocket. Why is that?

This is because we establish the WebRTC data channel as part of ICE. When you interrupt the connection to the bridge, this will trigger an ICE failure and the bridge channel will be-reestablished with the new ICE session.

In your case the client sent a notification to jicofo that its ICE session failed, and jicofo then sent a re-invite (via a transport-replace). The new ICE session failed, because it was on the same bridge where packets are filtered. Audio and video continued to work because they are P2P.

As Emil mentioned, the preferred way forward is to use jitsi-videobridge with WebSockets. In this case you would see independent retries for the ICE session (if it fails) and the WebSocket to jvb (if it fails).

I was looking for a document which describes how to setup jitsi-videobridge with WebSockets but I dind't find one. I'll write something up and send a link later today.

paweldomas commented 5 years ago

In general even the SCTP data channels are supposed to come back after ICE restart, but probably something is broken.

bgrozev commented 5 years ago

I think after the ICE restart the bridge ICE session never succeeds because the filter is still in place, but audio/video continues to work because it's using the p2p peer connection. I was able to reproduce it.

bgrozev commented 5 years ago

@bkempe Here is the doc for setting up colibri websockets. If you follow it let me know whether it works or not, I might have forgotten some parts.

bkempe commented 5 years ago

Thanks @bgrozev, this doc is very helpful and easy to follow. One thing I noticed is that using the port 8080 for org.jitsi.videobridge.rest.jetty.port might clash with the default value for org.jitsi.videobridge.rest.private.jetty.port. This may also have been the problem in this ticket: https://github.com/jitsi/jitsi-videobridge/issues/657

bgrozev commented 5 years ago

Good point, I'll change that.

iencotech commented 4 years ago

Hello, we are able to reproduce this issue of BridgeChannel staying closed and not being able to reopen it. In our case we have our own deployment with kubernetes, with multiple JVB (using StatefulSet like here https://github.com/hpi-schul-cloud/jitsi-deployment/blob/master/base/jitsi-shard/jvb/jvb-statefulset.yaml).

When one of the JVB pods dies, we can see that clients reconnect automatically to one of the other JVBs available. By debugging the JS we can see that this line is executed https://github.com/jitsi/lib-jitsi-meet/blob/master/modules/xmpp/strophe.jingle.js#L205, and BridgeChannel is initialized again, but immediately fails with error undefined:

Logger.js:154 2020-08-25T23:05:18.284Z [modules/RTC/BridgeChannel.js] <RTCDataChannel.e.onerror>:  Channel error: undefined
Logger.js:154 2020-08-25T23:05:18.285Z [modules/RTC/BridgeChannel.js] <RTCDataChannel.e.onclose>:  Channel closed by server
Logger.js:154 2020-08-25T23:05:18.573Z [modules/RTC/BridgeChannel.js] <l._send>:  Bridge Channel send: no opened channel.
Logger.js:154 2020-08-25T23:05:29.001Z [modules/RTC/BridgeChannel.js] <l._send>:  Bridge Channel send: no opened channel.

It is worth noting that the candidates on the initial logging of SDP is like this:

2020-08-25T23:00:40.015Z [modules/RTC/TraceablePeerConnection.js] <A.trace>:  getRemoteDescription::preTransform type: offer
...
a=candidate:1 1 udp 2130706431 <local_IP> 32101 typ host generation 0
a=candidate:2 1 udp 1694498815 <public_JVB_IP> 32101 typ srflx raddr <local_IP> rport 32101 generation 0

But after connecting to the failover JVB, we get these listed:

a=candidate:1 1 udp 2130706431 <local_IP> 32100 typ host generation 0
a=candidate:2 1 udp 1694498815 <public_JVB_IP> 32100 typ srflx raddr <local_IP> rport 32100 generation 0
a=candidate:1 1 udp 2130706431 <local_IP> 32101 typ host generation 0
a=candidate:2 1 udp 1694498815 <public_JVB_IP> 32101 typ srflx raddr <local_IP> rport 32101 generation 0
a=candidate:1 1 udp 2130706431 <local_IP> 32101 typ host generation 0
a=candidate:2 1 udp 1694498815 <public_JVB_IP> 32101 typ srflx raddr <local_IP> rport 32101 generation 0

Which makes us think that it is still having the old one in port 32101 as candidate and that might be why opening the data channel fails. At this point clients can see each other's video, and by inspecting in Wireshark we can see that they are connected via UDP to the bridge in port 32100.

We tried to do this manual call to initialize the channel again like this:

conference.rtc.initializeBridgeChannel(conference.getActivePeerConnection(), null);

And it fails with the same error. We don't have any event that we can listen to know when the data channel was closed, which makes us harder to detect when this happens in our application, resulting in clients that cannot exchange data messages with the bridge.

everflux commented 3 years ago

Could this also be related to this observation of crashing Firefox (but not chrome) https://community.jitsi.org/t/firefox-and-safari-endless-loop-with-tounifiedplan/99940

vinh-smartdev commented 3 years ago

I have the same issue, look like we do not have any event to catch this type of error on Javascript level.

The error was already thrown on native side before it can come up to Javascript side

jitsi / lib-jitsi-meet

BridgeChannel stays closed after connection interruption #950

Description

Current behavior

Expected Behavior

Possible Solution

Steps to reproduce

Environment details