microsoft / MixedReality-WebRTC

MixedReality-WebRTC is a collection of components to help mixed reality app developers integrate audio and video real-time communication into their application and improve their collaborative experience
https://microsoft.github.io/MixedReality-WebRTC/
MIT License
905 stars 280 forks source link

ICE Gathering mysteriously fail on MRWRTC 2.0 #605

Open SergioMasson opened 3 years ago

SergioMasson commented 3 years ago

After upgrading our project to MRWRTC 2.0 we have stared to experience some strange behaviur during the ICE gathering stage. We are currently using a custome solution for signaling based on websockets. However, this solution was working greate when using MRWRTC 1.0.3. Sometimes the connection succeeds as expected, but very often it gets stuck in the Checking stage of the ICE process. If we wait long enough it received a fail ICE stage event. Is this something that we could be doing wrong? Or is it a problem with the API ?

fibann commented 3 years ago

There have been a few reports of frequent ICE failures, especially across different networks, but the cause is unclear. Thanks for the report, it's useful to narrow it down to 2.0. Will investigate.

SergioMasson commented 3 years ago

Hi @fiban, thank you very much for investigating this issue. I have logs from the debug version of Mixed reality WebRTC 2.0.2 that we have compiled with our project. I have one log for a suceeded call performed on the local network and on log for a call that have failed performed with peers on different networks.

mrwebrtc2_0_2_win32_successCall_log.txt

mrwebrtc2_0_2_win32_failCall_log.txt

fibann commented 3 years ago

Thanks, this helps! I'll have a look.

fibann commented 3 years ago

I looked at the logs, in the success case the connection seems to be established on local network:

(p2ptransportchannel.cc:1789): Channel[0|1|__]: New selected connection: Conn[3a0b1900:0:Net[TP-Link:192.168.1.147/32:Wifi:id=3]:bRvlQFm7:1:0:local:udp:192.168.1.147:54226->uK2JeQtx:1:2122260223:local:udp:192.168.1.120:42006|CRWS|S|0|0|9114756780654476798|69]

while in the failure case the peer attempts to establish a UDP connection to what seems a private address, fails 151 times, then gives up:

(port.cc:1726): Conn[962b5d10:0:Net[Hyper-V:192.168.240.1/32:Ethernet:id=4]:WiGfgoGJ:1:0:local:udp:192.168.240.1:61185->wv6APIhA:1:2122194687:local:udp:100.96.41.21:42083|C--I|-|0|0|9114756780654476799|-]: Sent STUN ping, id=796643316b4f43647273492b, use_candidate=1, nomination=0 (port.cc:1726): Conn[483eca30:0:Net[TP-Link:192.168.1.147/32:Wifi:id=3]:yKQSDaM7:1:0:local:udp:192.168.1.147:61186->wv6APIhA:1:2122194687:local:udp:100.96.41.21:42083|C--I|-|0|0|9114756780654345726|-]: Sent STUN ping, id=5666535a744b6a6f4851334b, use_candidate=1, nomination=0 (port.cc:1220): Conn[962b5d10:0:Net[Hyper-V:192.168.240.1/32:Ethernet:id=4]:WiGfgoGJ:1:0:local:udp:192.168.240.1:61185->wv6APIhA:1:2122194687:local:udp:100.96.41.21:42083|C--I|-|0|0|9114756780654476799|-]: Failed to send STUN ping err=-1 id=524e385075494e6231656665 ... (port.cc:1773): Connection deleted with number of pings sent: 151

The problem here seems to be that the peer does not receive any STUN ICE candidates (srflx), so it has effectively no viable candidate. The peer does sends some STUN candidates itself, which seems to suggest that the other peer cannot use/reach any STUN server, or that those messages are dropped for some reason:

Candidates sent > candidate:1387532568 1 udp 2122260223 192.168.240.1 61185 typ host generation 0 ufrag H3qP network-id 4 candidate:3690124833 1 udp 2122194687 192.168.1.147 61186 typ host generation 0 ufrag H3qP network-id 3 network-cost 10 candidate:1387532568 1 udp 2122260223 192.168.240.1 61187 typ host generation 0 ufrag H3qP network-id 4 candidate:3690124833 1 udp 2122194687 192.168.1.147 61188 typ host generation 0 ufrag H3qP network-id 3 network-cost 10 candidate:1522192533 1 udp 1685987071 201.82.37.202 7172 typ **srflx** raddr 192.168.1.147 rport 61186 generation 0 ufrag H3qP network-id 3 network-cost 10 candidate:1522192533 1 udp 1685987071 201.82.37.202 7175 typ **srflx** raddr 192.168.1.147 rport 61188 generation 0 ufrag H3qP network-id 3 network-cost 10 candidate:473427432 1 tcp 1518280447 192.168.240.1 60737 typ host tcptype passive generation 0 ufrag H3qP network-id 4 candidate:2507221713 1 tcp 1518214911 192.168.1.147 60738 typ host tcptype passive generation 0 ufrag H3qP network-id 3 network-cost 10 candidate:473427432 1 tcp 1518280447 192.168.240.1 60739 typ host tcptype passive generation 0 ufrag H3qP network-id 4 candidate:2507221713 1 tcp 1518214911 192.168.1.147 60740 typ host tcptype passive generation 0 ufrag H3qP network-id 3 network-cost 10
Candidates received > candidate:3772129433 1 udp 2122262783 2804:214:8590:c0d7:757:b50:bce7:d0f8 48446 typ **host** generation 0 ufrag c8gK network-id 2 network-cost 900 candidate:1249779211 1 udp 2122194687 100.96.41.21 42083 typ **host** generation 0 ufrag c8gK network-id 1 network-cost 900 candidate:2924862569 1 tcp 1518283007 2804:214:8590:c0d7:757:b50:bce7:d0f8 9 typ **host** tcptype active generation 0 ufrag c8gK network-id 2 network-cost 900 candidate:83747579 1 tcp 1518214911 100.96.41.21 9 typ **host** tcptype active generation 0 ufrag c8gK network-id 1 network-cost 900

What's the other peer? Can you check why it is not generating STUN ICE candidates?

You mentioned that the same app used to work fine with MR-WebRTC 1.0.3, can you provide logs for a successful connection across networks for 1.0.3?

SergioMasson commented 3 years ago

Hi @fibann, Thank you very much for taking a look at our issue. The other peer is a web application running on google chrome. Our signaling solution is based on websockets. I will create the same logs running version 1.0.3 and post it here, I will also post the logs from our web application.

SergioMasson commented 3 years ago

@fibann, I was collecting the logs to send you and we decided to test our project using multiple setups. We were able to communicate using devices in different networks when both devices are connected to wifi using Ipv4. However, our main scenario will be for the Hololens to be used on a 4G network on the field. When we try to make a call with one of the peers using a 4G network we seam to get ICE messages mixing Ipv6 and Ipv4, our STUN server always return the IP adress using IPv4 since we are using google`s STUN server.

I found an open issue relating Ipv6 networks and Ipv4 STUN serves: https://github.com/pion/webrtc/issues/810

Do you think this can be the issue? The change from MRWRTC version 1.0.3 to 2.0.0 could just be matching the period when we have started taking the device to the field and receiving this errors.

I`m attaching the log collected from the webbrowser from calls performed with the Hololens when both devices are on diferent networks. The call that fails happend when the Hololens is connected to local hotspot of a mobile phone connected to 4G network. The succeed ones happend over Wifi.

HololensToChrome_Succeed_Call2.txt HololensToChrome_Succeed_Call3.txt HololensToChrome_Fail_Call4.txt HololensToChrome_Succeed_Call1.txt

fibann commented 3 years ago

The last fail log you attached does contain a STUN candidate sent by Chrome. But without any more detailed logs for that instance it's difficult to know why that's not being used. You can try doing that again, ensuring that both sides send and receive srflx candidate, and check both the MR-WebRTC debug logs and chrome://webrtc-internals on the browser to try to understand what's going wrong (see https://rollout.io/blog/webrtc-issues-and-how-to-debug-them/ for example to get started).

Note that since connections work over wifi, this might not be due to something in (MR-)WebRTC but to some limitation from your 4G provider rather.

SergioMasson commented 3 years ago

Thank you very much for your feedback @fibann , I will investigate that and post here anything that I may find.

grzegorzsabiniok commented 3 years ago

Hey @SergioMasson. I have similar problem. Do you find any solution?