paullouisageneau / libjuice

JUICE is a UDP Interactive Connectivity Establishment library
Mozilla Public License 2.0
403 stars 75 forks source link

STUN message send failed / STUN local ufrag check failed. #223

Closed mhoeben closed 10 months ago

mhoeben commented 10 months ago

I use libjuice as part of libdatachannel and regularly see sequences of messages like this:

rtc::impl::IceTransport::LogCallback@362: juice: Send failed, errno=101
rtc::impl::IceTransport::LogCallback@362: juice: STUN message send failed
rtc::impl::IceTransport::LogCallback@362: juice: STUN local ufrag check failed, expected="Sb+z", actual="lwFV"
rtc::impl::IceTransport::LogCallback@362: juice: STUN message verification failed
rtc::impl::IceTransport::LogCallback@362: juice: STUN local ufrag check failed, expected="Sb+z", actual="lwFV"
etc...

Most of the times I setup a session I don't see the messages, however, in approximately 10% of the sessions the pattern repeats multiple times until it seems to settle. Any idea whether this is an error on my behalf, occasionally to be expected or a bug in the library?

paullouisageneau commented 10 months ago

The Send failed, errno=101 is no big deal, it could just mean that IPv6 connectivity is not working.

The ufrag check failed may indicate a signaling or compatibility issue. Is the remote agent aiortc by any chance? This could by the symptom of https://github.com/paullouisageneau/libdatachannel/issues/982, which is fixed but not part of a release for now.

mhoeben commented 10 months ago

Understood about the errno=101.

The STUN server is stun.l.google.com:19302. The WebRTC peer is a Chrome browser and the server that uses libdatachannel runs in AWS. Not sure which data point is the one you are asking for.

Note that this happens before the Chrome browser connects. It is during the ICE establishment phase of setting up a peer connection. I use ICE/STUN to determine the server's public IP address and reachable port to be able to provide an offer as soon as a client connects over the signalling web socket.

paullouisageneau commented 10 months ago

It could be caused by a earlier peer connection on the same port as the new one closed on server side but not timed out on another client. Do you add tracks only and no datachannels? Have you changed libdatachannel's port range to something custom?

I use ICE/STUN to determine the server's public IP address and reachable port to be able to provide an offer as soon as a client connects over the signalling web socket.

Note that you should not keep the peer connection waiting with an unsent generated offer, instead you should create it only when the websocket is connected.

mhoeben commented 10 months ago

I add an audio and video track, as well as a data channel. And yes, I have limited the port range so I don't have to open too many ports on my EC2 instance. You could very well be right that it is responses from an older session. I will check that.

Note that you should not keep the peer connection waiting with an unsent generated offer, instead you should create it only when the websocket is connected.

Ok. I did it in the interest of quickly starting the session, but I see that it may cause problems.

Thanks for your help. I consider the issues satisfactorily addressed until I can provide more information.