twilio / twilio-video-ios

Programmable Video SDK by Twilio
http://twilio.com/video
Other
64 stars 22 forks source link

Tracks fail to connect or are slow #88

Open patricktemple opened 4 years ago

patricktemple commented 4 years ago

Hi,

We've been seeing sporadic problems establishing track connections. Some we've witnessed ourselves, others are from logs or bug reports from the field. I'm having some trouble debugging them and I'm hoping that your server logs can help.

Background: Our app has a connection from the iOS SDK (3.0.0) to JS (2.0.0) running in Chrome, connected peer-to-peer. iOS publishes two video feeds, an audio feed, and a data track. Web publishes its webcam, audio, and a data track.

Problem: Sometimes, when two sides try to join the room, no tracks come through to the web, or they come through very slowly. Our web client logs whenever the remote partipant and tracks connect, and I'm seeing a range of things happening in the logs:

Refreshing the page on web tends not to help, so I suspect the problem lies on the iOS side or in the signaling servers. Ending the call and starting again (which creates a new room) does usually fix it.

Here's my ideas on what might be happening:

While I wait for our improved iOS logging to get deployed, I'm trying to figure out what I can. Is there anything in your server logs that can tell me what's happening with the track signaling? Here are three rooms which all showed the same user-facing symptom on our web client—the remote user never seemed to join the call—but with different stories emerging from the logs.

I've filed a support ticket listing the room IDs and linking them to this issue.

Room A

This one happened with me on web, and the logs show this sequence after I joined:

  1. [1s after web connects] The remote audio and video tracks connected after about 1s.
  2. [4s after web connects] Both video tracks fired their muted event which means that there's a problem emitting video
  3. [18s after web connects] I refreshed the web page trying to fix this.
  4. [1s later] Remote participant disconnected

Room B

In this one, even the logs suggest that the iOS user never joined. No track events came through to web, and the Twilio console logs don't show them as as participant. I know that iOS did call TwilioVideo.connect but unfortunately didn't log the result of this.

Room C

Timeline after web connected:

  1. [1.7s after connection] Remote audio and video tracks connected.
  2. [4-5 after connection] Remote video tracks muted.
  3. [17s after connection] Remote data track connected.

My next step

I should bump up the TwilioVideo.logLevel and add logging and try logic around roomDidDisconnect and roomDidFailToConnect. But I'm not sure that would actually solve the problem. If your room logs could tell us more about what was happening with the track signaling during these rooms, it'd be really helpful.

Note: I also filed https://github.com/twilio/twilio-video.js/issues/931, which feels similar but is probably a different bug because refreshing the browser does fix that one.

Thanks for your help, Patrick

piyushtank commented 4 years ago

@patricktemple Thanks for reaching out and providing the details about the issues you are observing. I apologize for the late response.

We looked into the three Room SIDs that you have sent.

Room A and Room C

We noticed that iOS user connects to a Room first with one audio, two video, and one data track. Then JavaScript client connects to the Room. And immediately affter connecting to the Room, immediately JavaScript client publishes a video and data track.

What you are observing could be a bug we have in our infrastructure side where we are not handling the race condition when a call is made between JavaScript and iOS Client. We think that the problem should not exist between for a video call between two iOS users or two JavaScript users. While we are looking into the issue, is it possible for you use this as a workaround - on JavaScript client, instead of connect and then publish, can you try connect with tracks and see if problem goes away.

Room B

We could not see iOS user in the Room on our logs.

You are right, the root cause of this problem could be same as https://github.com/twilio/twilio-video.js/issues/931. see if above suggested workaround helps while we are investigating the problem.

Best, Piyush

patricktemple commented 4 years ago

Hey Piyush, thanks for looking into this. I can try publishing the media tracks up front in JS. Do you think that might fix problems with the remote tracks as well? You mentioned those were already published by the time the JS client joined.

Are you sure the web client published the data track after joining? We join like this:

Video.connect(token, {
  name,
  logLevel: 'info',
  maxVideoBitrate: 250000,
  audio: false,
  video: false,
  tracks: [new LocalDataTrack()]
});

And then we publish the audio and video shortly afterwards. So it makes sense that those were published afterwards, but the data track surprises me.

piyushtank commented 4 years ago

@patricktemple Yes, connecting upfront with tracks may solve the problem on JS side for the remote tracks as this is more of an interoperability issue between JS and iOS in P2P Rooms.
Let me know if the workaround doesn't help.

piyushtank commented 4 years ago

@patricktemple Did the work around we recommend help? Also, you should upgrade Video SDKs to the latest, as they have bug fixes and new features.

patricktemple commented 4 years ago

I've written the workaround but it has not deployed yet. I've had a hard time seeing this happen myself so I won't know for sure until we see a drop in cases of this in the logs.

patricktemple commented 4 years ago

Hi,

The change has deployed which causes the web to include the A/V tracks up front rather than immediately publishing them after joining. I saw the problem still happen though, in room RM88a40e9b42d1d0017d249f126e779b11

Here's the timeline I see in the logs from web's perspective: 0s: Web loads the page 2s: Remote participant joins (this might be right when web joined the room... I see in Twilio console logs that the remote iOS user actually joined first in this case) 5s: One remote video track and remote audio track are subscribed 19s: The remote data track and second video track are subscribed.

The iOS app publishes all its tracks right when it joins the session. So the long delay must be happening at the SDK or WebRTC signaling level.

The JS SDK has now been upgraded to 2.3.0 btw, though iOS is still on 3.0.0.

piyushtank commented 4 years ago

Unfortunately, the bug is caused by a race condition caused in P2P Rooms and I do not have timeline about when the fix will be implemented and deployed. Is it possible for you to use the group Rooms instead, you should not see the problem.

Also, can you upgrade the iOS SDK to the latest and let us know if you are still able to reproduce the bug.

patricktemple commented 4 years ago

I see. We'll try to upgrade the iOS SDK soon and see if that helps.

Unfortunately, group rooms aren't an option for us because it's important that the connection remain E2E encrypted.

Would things be helped if we change the timing of when the clients connect? Right now, they both try to join right at the same time:

1) iOS starts the call (makes an API call to our server) 2) Web receives the call notification from our server 3) Web answers the call, and iOS receives a notification that this has happened. Our server creates the Twilio room and then both clients join the room at this time.

We could instead have iOS join the room when it places the call, which would make it unlikely that it's joining at the same time as web. Do you think that might help? It's got some downsides for us so I wouldn't do it otherwise, but would be good to know if that's an option.

patricktemple commented 4 years ago

Hi Piyush, do you have any updates on this? We're still seeing this issue pretty frequently.

piyushtank commented 3 years ago

We are working on unified plan support on mobile SDKs. Once the feature is released, this problem should get resolved.

patricktemple commented 3 years ago

Is there an update on this, or an expected timeframe for release of the new Unified Plan version? Thanks!

piyushtank commented 3 years ago

@patricktemple Our team is currently working on Unified Plan support, I dont have any specific timeframe, but should be out soon, may be in Q2. We will keep you posted.