lynckia / licode

Open Source Communication Provider based on WebRTC and Cloud technologies
http://lynckia.com/licode
MIT License
3.09k stars 1.02k forks source link

ICE connection fails: NICE_COMPONENT_STATE_FAILED #482

Closed mkhahani closed 8 years ago

mkhahani commented 8 years ago

I am getting ICE connection failure in MCU mode, mostly on 3G Internet. Both client and server have public IP. Testing with Janus gateway worked well. I tracked the error and found out it's the libnice error NICE_COMPONENT_STATE_FAILED. From the docs it means "Connectivity checks have been completed, but connectivity was not established". What that mean? How can I go deeper? Further explanation here.

mkhahani commented 8 years ago

Check this commit out. The comment says:

if (state == NICE_COMPONENT_STATE_FAILED) {
/* Failed doesn't mean necessarily we need to give up: we may be trickling */
...
mkhahani commented 8 years ago

Team, this is serious. I don't know much about Trickle ICE. Any tips would be appreciated.

lodoyun commented 8 years ago

Trickle ice means that you don't have to exchange all the ICE candidates at once, you can do it incrementally as they are discovered while the checks are taking place. More details here. While what you point out looks like could potentially cause failed connections, I'm having a hard time reproducing this locally. I'll keep trying, though. A way to see whats going on is to inspect the erizoJs logs (licode/erizo_controller/erizoAgent/*.log) and looking what candidates are being added (look for Adding remote candidate) and in what order. You can compare that to the candidates that are selected when you are using janus in chrome://webrtc-internals and see if the candidate that actually works for the pair is being added in Licode.

lodoyun commented 8 years ago

Ok, I've managed to reproduce it and should be fixed as of 2836e7c9537dfe4fd8deebcc34f647b0a8b082e8 if the original problem was NICE failing before receiving all candidates. In my tests, candidates have to be > 10 seconds late... Try it and reopen if you find it keeps being a problem.

mkhahani commented 8 years ago

Thank you Pedro for the commit!

I updated my local repository and compiled Licode, but the problem persists. Here are full log files of a successful/unsuccessful connection:

erizo-gprs-failure.log erizo-wifi-success.log

The server IP has been replaced with 79.175.138.xxx in the logs.

lodoyun commented 8 years ago

Then probably that's not the problem. Can you check if you get this WARN log "Stream" an id "has failed after succesfuly ICE checks" in the client? Also, if you can connect with janus or any other server in the same conditions, can you check chrome WebRTC Internals and generate a dump for both cases (Licode vs other solution)? If you mask the IP just make sure it's the same in both logs so I don't get confused

mkhahani commented 8 years ago

Yes I get the warning. Accidentally I could get two dumps (failure/success) both with Licode and with the same Internet but with different client IPs (achieved after a 3G disconnect/connect). Hope it helps.

webrtc_internals_dump_failure-txt webrtc_internals_dump_success-txt

lodoyun commented 8 years ago

So you are saying that sometimes it works in Licode in the same conditions?

Does it consistently work in other platforms?

Pedro Rodriguez

On 23 Jun 2016, at 13:50, Mohsen Khahani notifications@github.com wrote:

Yes I get the warning. Accidentally I could get two dumps (failure/success) both with Licode and with the same Internet but with different client IPs (achieved after a 3G disconnect/connect). Hope it helps.

webrtc_internals_dump_failure-txt https://gist.github.com/mkhahani/343f0473741b53d813e4533dff77a565#file-webrtc_internals_dump_failure-txt webrtc_internals_dump_success-txt https://gist.github.com/mkhahani/343f0473741b53d813e4533dff77a565#file-webrtc_internals_dump_success-txt — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ging/licode/issues/482#issuecomment-228179447, or mute the thread https://github.com/notifications/unsubscribe/AA48O-JNkLYM36MfcF-Kh9AdK7GEz3fjks5qOvF-gaJpZM4I5GvS.

mkhahani commented 8 years ago

Not exactly the same conditions but yes with the same server and Internet. I think it's the network conditions. Other users connecting through different 3G networks have also confirmed the issue. Most of the times it fails on 3G, does not matter how many times I retry. But accidentally, after gaining a new IP from the network, it just works and keeps working without ANY failure!

The weird is that at the time it fails on Licode, it works well on Janus and the Licode demos as well. But as I mentioned in the forum, the Licode demo is benefiting from TURN server while Janus is not. Pedro, I can send you access to my server if I have your email address.

I've tested with Chrome and Firefox on Windows and Android with the same results.

So you are saying that sometimes it works in Licode in the same conditions? Does it consistently work in other platforms?

mkhahani commented 8 years ago

@lodoyun I'm still struggling with the issue and I'm going to inspect and fix it. But I'm not experienced in WebRTC underlying concepts and protocols so I need your help to put me in the right direction.

Thanks a lot.

cracker0dks commented 8 years ago

basic example running (on last commit) without TURN: (server removed)

mkhahani commented 8 years ago

@cracker0dks Thank you! The server you mentioned worked well.

I also installed Licode on another server with Ubuntu 14 and it worked too! So I almost ensure that some thing is wrong with my server. It is CentOS 7 with the latest version of Licode without any modification. I could not go farther.

Is this about CentOS? Or Licode dependencies (i.e. version of libraries)? Server network conditions? Any idea?

mkhahani commented 8 years ago

Ah I can't believe it was the firewall! When I stop it then everything work just fine. It's weird since it was working for some clients. This dumbed me so I never thought to firewall.

However I wonder what rules do I need to add to the firewall. The current allowed services are: dhcpv6-client ssh dns http https and smtp plus Licode required ports.

cracker0dks commented 8 years ago

You need (at least) this ports for running the webapp with licode:

This are my firewall settings from the server I posted 2 days ago.

mkhahani commented 8 years ago

Fixed! Thank you for the quick and precise help.

Thank you guys @lodoyun and @cracker0dks for helping me and sorry if I took your time. I should have done more tests before creating the issue. I hope the post can be useful for other users facing with similar problem.

nikhilrayaprolu commented 8 years ago

I have made the same settings but with different port numbers for udp (webrtc).this made my application show blank screens for a some video streams in firefox

cracker0dks commented 8 years ago

yep! the reason behind this firefox udp problem is described here: https://github.com/ging/licode/issues/426#issuecomment-221552394