libp2p / specs

Technical specifications for the libp2p networking stack
https://libp2p.io
1.58k stars 275 forks source link

webrtc: race condition during connection negotiation #585

Open achingbrain opened 1 year ago

achingbrain commented 1 year ago

When exchanging ICE candidates, the connectionStatus of one side becomes connected before the other.

At that point it closes the signalling stream (at the moment - see #580)

If this happens before it's sent its final ICE candidate, the remote's end of the stream can close mid-read and the remote's RTCPeerConnection may still have the connectionStatus of connecting.

This is handled in js by waiting for the connectionStatus of the RTCPeerConnection to change within a time out when the stream closes, even mid-read but is not reset. It works but it feels a bit hacky.

It's necessary in JS because the receiving of ICE candidates and the changes of connectionStatus on the local RTCPeerConnection are async and decoupled so may occur in any order.

Do we care about this or would it be better to add an additional, final message that can be sent over the signalling stream when one side's RTCPeerConnection becomes connected like SDP_COMPLETE or similar?

This message would mean 'this end is "connected", no more ICE candidates will be sent, the stream can/will be closed, and you should wait for your end to become "connected" or to time out'.

cc @sukunrt @mxinden @marten-seemann

sukunrt commented 1 year ago

Since one peer believes the connection is state is Connected, the lagging peer would have received the Candidate, is it fine to just end the read loop and wait for peerConnectionState update when an EOF is received on the relayed stream?

achingbrain commented 1 year ago

is it fine to just end the read loop and wait for peerConnectionState update when an EOF is received on the relayed stream?

This is how it's currently implemented but the EOF leaves an error message in the logs. Adding a flag that says "i'm connected, you should be too shortly" lets us handle the scenario gracefully?

sukunrt commented 1 year ago

I'm a bit indifferent to the asked for change since in go-libp2p it's easy to check EOF and not log anything in such cases and this behaviour is not improved by adding a final message. However if there's no way of achieving this in js, we should definitely add this.

Is there a backwards compatible way of handling this? Given only js-libp2p has webrtc(private-to-private) what would its behaviour be on receiving a message with a new message type?

Should we also mention in the specs how implementations should handle unknown message types given that we might add some other message type in the future?

achingbrain commented 1 year ago

it's easy to check EOF and not log anything

The thing EOF is that it can mean a few different things and the code is left guessing what the outcome was and if it was a failure. Adding an explicit flag makes node behaviour easier to reason about in the future.

Is there a backwards compatible way of handling this? Given only js-libp2p has webrtc(private-to-private) what would its behaviour be on receiving a message with a new message type?

It would ignore the field as it wouldn't be in it's protobuf definition.

Should we also mention in the specs how implementations should handle unknown message types given that we might add some other message type in the future?

This is handled by the protobuf spec, essentially you should always add new field indexes for new types and not reuse old ones.

mxinden commented 1 year ago

it's easy to check EOF and not log anything

The thing EOF is that it can mean a few different things and the code is left guessing what the outcome was and if it was a failure. Adding an explicit flag makes node behaviour easier to reason about in the future.

What does EOF mean other than a graceful close of a stream? The faster peer should set the FIN flag on the Yamux stream on the relayed connection on the happy path. All other cases should result in a RST flag on the Yamux stream on the relayed connection. Am I missing something @achingbrain?