twilio / twilio-video.js

Twilio’s Programmable Video JavaScript SDK
https://www.twilio.com/docs/video/javascript
Other
571 stars 217 forks source link

How to handle reconnection with version 2.5.1? #1070

Closed ing-fabian closed 4 years ago

ing-fabian commented 4 years ago

Code to reproduce the issue:

Twilio.Video.connect(token, {
   region: 'xxx',
   tracks: localMediaTracks,
   eventListener: _twilioEventListener,
   networkQuality: {
      local: 2,
      remote: 2
  },
    logLevel: 'debug',
}).then(room => {
      room.on('participantConnected', (participant) => console.log(participant));
      room.on('participantDisconnected', (participant) => console.log(participant));          
      room.on('participantReconnecting', (participant) => console.log(participant));          
})

**Logs

twilio-video.js:5271 2020-06-25 15:17:58.476Z | WARN in [PeerConnectionV2 #1: 050c4341-d5fe-4de5-b5ca-d52c5de06451]: ICE Connection Monitor detected inactivity
log @ twilio-video.js:5271
warn @ twilio-video.js:5292
(anonymous) @ twilio-video.js:3455
(anonymous) @ twilio-video.js:3090
Promise.then (async)
(anonymous) @ twilio-video.js:3088
twilio-video.js:5271 2020-06-25 15:17:58.477Z | WARN in [PeerConnectionV2 #1: 050c4341-d5fe-4de5-b5ca-d52c5de06451]: An ICE restart has been scheduled
log @ twilio-video.js:5271
warn @ twilio-video.js:5292
_initiateIceRestartBackoff @ twilio-video.js:3493
(anonymous) @ twilio-video.js:3455
(anonymous) @ twilio-video.js:3090
Promise.then (async)
(anonymous) @ twilio-video.js:3088
twilio-video.js:5271 2020-06-25 15:17:58.483Z | WARN in [PeerConnectionV2 #1: 050c4341-d5fe-4de5-b5ca-d52c5de06451]: Attempting to restart ICE
log @ twilio-video.js:5271
warn @ twilio-video.js:5292
_initiateIceRestart @ twilio-video.js:3489
(anonymous) @ twilio-video.js:3370
EventEmitter.emit @ twilio-video.js:6941
Backoff.onBackoff_ @ twilio-video.js:6847
setTimeout (async)
Backoff.backoff @ twilio-video.js:6846
_initiateIceRestartBackoff @ twilio-video.js:3493
(anonymous) @ twilio-video.js:3455
(anonymous) @ twilio-video.js:3090
Promise.then (async)
(anonymous) @ twilio-video.js:3088
twilio-video.js:5271 2020-06-25 15:18:01.272Z | WARN in [TwilioConnection #2: wss://de1.vss.twilio.com/signaling]: Unexpected state "closed" for handling a "heartbeat" message from the TCMP server.

Expected behavior:

We need a way of resetting the state and try to connect again the user, even if the signalling has failed. With the old version 2.0.0 we were catching the disconnected event and after that we basically clean the whole state of our app and retry the connection, for some of our users this is working. How can we do this with the new version?

Actual behavior:

With the version 2.0.0 we were able to get this events trigger when the user was disconnected or his signalling has any problem. Now, with the new version we are able to catch the error on the signalling using the eventListener but after that error is reported we can not connect again. Basically if this problem happens to our users, we are not able to reconnecting the user without refreshing the page.

Software versions:

manjeshbhargav commented 4 years ago

Hi @ing-fabian ,

Thanks for writing in with your issue. I apologize for the delayed response. There shouldn't be any difference in how you handle connection disruptions between 2.0.0 and the latest 2.x version. Can you share with me exactly how you were managing reconnections in 2.0.0 and how you are managing them in 2.5.1? Code snippets would be greatly helpful.

Thanks,

Manjesh Malavalli JSDK Team

ing-fabian commented 4 years ago

Hi Manjesh,

We are doing the reconnecting process in the same way we were doing it, the only difference is that now when we try to reconnect we got a SignalingServerBusyError and the user is not able to reconnect.

Things we see have changed:

  1. Before when the user was disconnected, Twilio triggered the event participantDisconnected. Now, instead of receiving this event, we receive a signalling server error. This happens when we try to emulated a disconnection which we are doing by stopping the normal execution of our application within DevTools in the browser.

  2. Before after receiving the participantDisconnected event, we just clean all the resources and connect again. Now, if we do this, as I already mentioned, the connect function from Twilio returns a SignalingServerBusyError and it's not possible to reconnect from our site, the only way we found possible was refreshing the page.

  3. Before during the retry we did not have any problem retrying the connection, now if we catch the signalling error and retry the connection we got into an infinite loop because the connect method return a signallingServerBusyError and the cycle never stop.

Below you can see a snippet, the snippet might be wrong as I just write it to show a basic example of what we are doing. I might had forget a comma or something :).

const sdkEvents = new EventEmitter();

sdkEvents.on('event', event => {
  const { level, name } = event;
  if (name === 'closed') {
    reconnect();
  }
});

function reconnect() {
  const retries = 0;
  const connectWithRetry = () => {
     try{
       connect(token, { ..., eventListener: sdkEvents }).then( room => {
           room.on('participantDisconnected', (participant) => reconnect());          
       ).catch(error){
          retries++;
          if(retries < 3) {
               connectWithRetry();
          }
       }
     } catch(e) {
       retries++;
       if(retries < 3) {
          connectWithRetry();
       } else {
          console.log('Stop reconnecting');
       }
     }
  };

  connectWithRetry();
}

I hope this is more clear now.

manjeshbhargav commented 4 years ago

Hi @ing-fabian ,

You should not be reconnecting on the participantDisconnected event because this event means that somebody else left the Room. not you. Instead you should reconnect only on the disconnected event like so:

room.on('disconnected', (room, error) => {
  if (error) {
    reconnect();
  }
});

Getting a "closed" event on the EventListener does not necessarily mean that you got inadvertently disconnected. It may also be due to the user voluntarily leaving the Room. So I wouldn't recommend reconnecting there.

If you are getting the SignalingServerBusyError, then you are creating too many connection requests too fast. So, I recommend that you add an exponential backoff mechanism to your retries. Also, with all 2.x SDKs, we will internally try to reconnect when there is a network disruption or handoff.

So, to sum up, reconnecting with an exponential backoff and reconnecting on disconnected instead of participantDisconnected should fix your issues.

Please let me know if this helps.

Thanks,

Manjesh Malavalli JSDK Team

ing-fabian commented 4 years ago

Hi Manjesh,

Indeed, you are right the event that we are listen to do the reconnection is room.on('disconnected'), I just wrote the code from my main and made a mistake. Anyway, if I understood well, I just need to wait some time till I can actually create a new connection. I'll give it a try and come back to you. Thanks!

ing-fabian commented 4 years ago

Hi Manjesh,

Thank you for your help, I'm closing this issue now. Applying a 500ms delay after getting a signalling error, now the reconnection process works as expected.