webrtc/: Add libp2p WebRTC browser-to-server spec

mxinden commented 2 years ago

This pull requests summarizes the status quo of the ongoing WebRTC in libp2p work. In other words, it represents the summary of all past discussions like https://github.com/libp2p/specs/issues/220.

While describing both the browser-to-server and browser-to-browser use-case, priority is on the former. Once the browser-to-server use-case is sufficiently specified, I will remove the browser-to-browser use-case, merge this pull request, and start iteration two, namely to sufficiently specify the browser-to-browser use-case.

Remaining work items for browser-to-server:

[x] Specification
- [x] Server to start Noise handshake https://github.com/libp2p/specs/pull/412#discussion_r977698490
- [x] Head of line blocking https://github.com/libp2p/specs/pull/412#discussion_r984720911
- [x] RTCDataChannel ID negotiation https://github.com/libp2p/specs/pull/412#discussion_r984746570
- [x] RTCDataChannel ID reuse https://github.com/libp2p/specs/pull/412#discussion_r984747802
- [x] Fixing fingerprint hash algorithm to sha-256 https://github.com/libp2p/specs/pull/412#discussion_r968327480
- [x] Document potential deprecation of WebRTC Star and WebRTC Direct once browser-to-browser support is specified
- [x] Fingerprint sorting https://github.com/libp2p/specs/pull/412#discussion_r978398623
- [x] Document label on RTCConnection.createDataChannel. https://github.com/libp2p/specs/pull/412/files#r995877508
- [x] Communicate message size limit in SDP https://github.com/libp2p/specs/pull/412#issuecomment-1287780905
[ ] Implementations of this specification
- [ ] JS https://github.com/libp2p/js-libp2p/issues/1478
- [ ] Go https://github.com/libp2p/go-libp2p/pull/1655
- [ ] Rust https://github.com/libp2p/rust-libp2p/pull/2622
  - [ ] Message framing https://github.com/melekes/rust-libp2p/pull/10
  - [x] Use Noise prologue https://github.com/libp2p/rust-libp2p/pull/2622/commits/7e630a634c3cd82a2286e19f208e2203e86173a6
[ ] Testing via Testground
- [x] https://github.com/testground/testground/issues/1386
- [ ] https://github.com/testground/testground/pull/1474

tomaka commented 2 years ago

I see several discussion about browser-to-browser. For the sake of shipping things, I would suggest to keep the scope of this RFC to browser-to-server, which is a way less complicated problem.

mxinden commented 2 years ago

Note that both within Protocol Labs as well as Little Bear Labs (@ckousik and @lightsofapollo) our priority is on browser-to-server. Based on the above and previous conversations this is the same for Parity (@melekes and @tomaka).

Thus I think we are aligned here.

I would suggest to keep the scope of this RFC to browser-to-server

I am fine discussing both here at this point in time. That said, once the browser-to-server specification part is solid, I am happy to proceed and merge, marking the browser-to-browser part as an early draft only. In other words, I don't think the browser-to-browser work should block the browser-to-server work.

ckousik commented 2 years ago

@mxinden @marten-seemann Summarizing our conversation, please note the following:

In pion, there is a way to fetch the remote description (therefore the fingerprint) and confirm after NOISE: https://pkg.go.dev/github.com/pion/webrtc/v3#PeerConnection.RemoteDescription
The current spec uses the DTLS fingerprint as both the ufrag and password, so it allows the STUN connectivity checks to be replayed or fabricated since the DTLS fingerprint for a server will be added to the multiaddress.

ckousik commented 2 years ago

The STUN spec has security considerations here: https://datatracker.ietf.org/doc/html/rfc5389#section-16.1.2

A rogue client may try to launch a DoS attack against a server by sending it a large number of STUN requests. Fortunately, STUN requests can be processed statelessly by a server, making such attacks hard to launch.

The problem here is that STUN messages are not processed statelessly, and messages from a previously unknown address are treated as a new peerconnection, which will take up resources on the server.

ckousik commented 2 years ago

@mxinden @marten-seemann Given the above mentioned issue, @lightsofapollo and I currently have a couple of recommendations:

We could rate limit incoming connections from a given IP address on the webrtc port. The rate limiting criteria are yet to be discussed. This would mitigate malicious actors wasting resources by spawning new PeerConnections, but would not solve the problem completely.
As mentioned https://github.com/libp2p/specs/pull/412#discussion_r904598628 , we could extend DCUtR to support exchange of SDP and ICE candidates. This would require a trusted relay to be present, but would be a far more secure solution as each new connection can use a new certificate, ice credentials and by extension connectivity checks would be secure, and implementations would not be limited to using ICE-lite. I am unsure of how the transport multiaddr will be specified here. This would trivially solve the browser to server and browser to browser use cases.

mxinden commented 2 years ago

A rogue client may try to launch a DoS attack against a server by sending it a large number of STUN requests. Fortunately, STUN requests can be processed statelessly by a server, making such attacks hard to launch.

The problem here is that STUN messages are not processed statelessly, and messages from a previously unknown address are treated as a new peerconnection, which will take up resources on the server.

Can you expand on the data that needs to be cached by the server between the initial STUN message and the actual connection attempt? For the browser-to-server use-case what information does the server need from the initial STUN message send by the client @ckousik?

melekes commented 2 years ago

The problem here is that STUN messages are not processed statelessly,

Is that even possible unless you ignore them completely?

NOTE: https://github.com/webrtc-rs/ice/blob/0365dbc3ac739495570de8a69f7e8020a70de4a2/src/udp_mux/mod.rs#L175-L211 webrtc-rs/ice library also process STUN messages statefully.

A rogue client may try to launch a DoS attack against a server by sending it a large number of STUN requests. Fortunately, STUN requests can be processed statelessly by a server, making such attacks hard to launch.

The solution might be

1) rate limit the number of messages per IP (as noted here) 2) limit the number of new addresses "in flight" (currently 1)

tomaka commented 2 years ago

This would require a trusted relay to be present

I'd like to note that at Parity, our entire use case is to avoid requiring a trusted relay. If a trusted relay is mandatory, then we have no interest in WebRTC anymore.

ckousik commented 2 years ago

@mxinden The spec itself refers to simple STUN servers used mostly for finding the public IP, in which case processing is stateless. In the case of this spec, the stateful operations are described below.

@melekes True, this attack cannot be avoided unless the packets are dropped. The STUN spec refers to STUN-only servers and not ICE implementations. The difference between webrtc-rs and the current spec is as follows:

webrtc-rs case: When a new STUN message is received on the muxed UDP, the muxer will check to see if there is an existing PeerConnection object based on the source address, failing which it will check based on the ufrag contained within the STUN message. If no existing connection is found based on either of these criteria, the packet is dropped. Therefore, if a malicious client generates and sends a STUN message to the port without a prior SDP exchange, the packets will simply be dropped.

current spec: With reference to this implementation, consider the case where a malicious client sends multiple STUN messages with different ufrags from different ports. For each message, a new PeerConnection instance is created alongside all the timers, tasks, etc. assosciated with it. The malicious client need not even generate a certificate locally, only a string that looks like a fingerprint and this would be enough to negotiate ICE. This consumes significantly more server resources per STUN message than the webrtc-rs case.

@tomaka As per @melekes suggestion, rate limiting should help mitigate this problem considerably at the cost of performance in certain edge cases (consider multiple legitimate users behind the same NAT trying to connect to the same libp2p node). It should be noted that this does not work for the browser-to-browser case but that is currently lower priority.

mxinden commented 2 years ago

The commits above restructure the spec and add the latest discussions as open questions. Input more than welcome.

Discussed out of band, @ckousik will start implementing the current state of the specification for go-libp2p.

mxinden commented 2 years ago

The most recent 5 commits track the discussions above as well as an out-of-band meeting with @ckousik, @marten-seemann and me.

7e601e7 webrtc webrtc/: Track open question on whether server should initiate Noise
c8e725a webrtc/: Document hash function match on WebRTC and Noise
a8a42b5 webrtc/: Document certhash being a multibase encoded multihash
f5eb1b7 webrtc/: Document potential amplification defense through large client STUN
5f7ea61 webrtc/: Document random string in STUN username and password

@melekes and of course everyone else, please let me know in case you have any questions or concerns.

mxinden commented 2 years ago

@ckousik following up on the conversation on who should initiate the Noise handshake, the browser or the server, and whether the current Rust implementation is in-line with the specification. Please see https://github.com/libp2p/rust-libp2p/pull/2622#discussion_r918934992

Let me know if that makes sense.

marten-seemann commented 2 years ago

This came up during the IPFS þing: What are the connection options in local networks? It would be nice if we could connect two nodes on the same network by simply scanning a QR code (which might encode the SDP). Somewaht similar to what https://github.com/AquiGorka/webrtc-qr is doing.

mxinden commented 2 years ago

I am assuming that the local network does not impose any firewalls or NATs between the two nodes @marten-seemann? If so, a QR code encoding the Multiaddr of node B would suffice for A to connect to B.

achingbrain commented 2 years ago

I am assuming that the local network does not impose any firewalls or NATs between the two nodes

It's usually all or nothing - some (usually public access) WiFi hotspots enforce client isolation so it's impossible to send traffic between two nodes directly on the same network, others will allow everything.

Sean-Der commented 2 years ago

@marten-seemann I would love to see more exploration of what we could do with mDNS. offline-browser-communication is an example of connecting two Peers without any signaling. Some bootstrapping/pre-configuration needs to be done though.

Currently this isn't possible Browser <-> Browser but maybe that doesn't matter to us? If libp2p finds it useful it could be the impetus to make browser vendors take this idea seriously.

ckousik commented 2 years ago

@marten-seemann @mxinden Regarding the implementation of Reset in webrtc datachannels, please note the following:

While the datachannel implementation does have a CloseAbruptlyWithError method on it, this is not exposed in the DataChannel interface and seems to be used only internally. Also, this method does not signal any error to the peer, and proceed with normal datachannel closure. Reference links: 1 2 3
After Close is called on a datachannel, any data that is queued is sent and any further writes will be discarded [ 4 ]. If the datachannel state is not open ('connecting', 'closing', or 'closed'), incoming data is queued but never sent to the user through the observer. Therefore, after closing, we are unable to read any inflight data when it is received. It will only be queued until the queue is full, after which the datachannel will close [ 5 ].

mxinden commented 2 years ago

For the record, a solution for half-close and reset of streams (https://github.com/libp2p/specs/pull/412#discussion_r932487837) is proposed via https://github.com/mxinden/specs/pull/1. Input welcome.

mxinden commented 2 years ago

https://github.com/libp2p/specs/pull/412/commits/7009f943d5b9ee67cb7a19bbf8ca60549c1ad683 specifies minimum required support for sha256 and base64url. Comments are welcome, though I suggest for these comments to be posted on https://github.com/multiformats/multiaddr/pull/130 instead of this pull request to keep the discussion in one place.

melekes commented 2 years ago

One note: there's an implicit assumption that WebRTC data channels are ordered (which is the default in webrtc-rs and Go Pion I believe), but they could also be unordered since SCTP supports both options. Should we maybe make this explicit (i.e. add smth like All opened data channels must be ordered.)?

marten-seemann commented 2 years ago

One note: there's an implicit assumption that WebRTC data channels are ordered (which is the default in webrtc-rs and Go Pion I believe), but they could also be unordered since SCTP supports both options. Should we maybe make this explicit (i.e. add smth like All opened data channels must be ordered.)?

Ordering is not a property of the underlying channel, ordering is a property that an API can expose. You can also have an API that gives you random access. See https://datatracker.ietf.org/doc/html/rfc9000#section-2.2 for how QUIC describes this.

mxinden commented 2 years ago

Should we maybe make this explicit (i.e. add smth like All opened data channels must be ordered.)?

Good idea @melekes. Let me know what you think of https://github.com/libp2p/specs/pull/412/commits/48f5fe767c73cebd0d751de87d8fe7a697db1884.

Ordering is not a property of the underlying channel, ordering is a property that an API can expose. You can also have an API that gives you random access. See https://datatracker.ietf.org/doc/html/rfc9000#section-2.2 for how QUIC describes this.

:+1: once we design unordered streams for the libp2p project in general, we can revisit how to expose such functionality on WebRTC in specific.

mxinden commented 2 years ago

That is not accurate. libp2p implementations use ordered streams, but the streams themselves can be unordered (QUIC streams, for example), and there's nothing in any of our protocols preventing a random access into received stream data.

I'm not sure I fully understand the ordered property of the RTCDataChannel, but what I gather from https://www.w3.org/TR/webrtc/#attributes-17, this is a local property and has no effect on the wire behavior of the protocol. If that interpretation is correct, we shouldn't have a MUST here: A libp2p implementation in the browser MAY expose an ordered byte stream abstraction in its API, or it MAY expose an unordered API to the application.

https://github.com/libp2p/specs/commit/48f5fe767c73cebd0d751de87d8fe7a697db1884#r83389191

This is a fair point. I too read it as a local property only. https://github.com/libp2p/specs/pull/412/commits/9ce0d45e54f7e67a6ed4489901c8437a83687590 adjusts the specification accordingly, allowing implementations to expose an unordered API to users.

marten-seemann commented 2 years ago

Other things that should probably be discussed here:

There's a maximum number of 65k streams per WebRTC connection: https://developer.mozilla.org/en-US/docs/Web/API/RTCDataChannel/id. This is significantly less than our other stream muxers allow (yamux: uint32, QUIC: uint62), and might lead to exhaustion of the stream ID space for long-lived connections, especially with our recent push towards more short-lived streams. We need to specify what happens when a peer hits this limit.
Should this document specify how to connect a browser to a private server (I assume it would be very similar to browser <-> browser)? In practice, this is probably the more interesting use case, as browser -> public node is covered by WebTransport, which is expected to be the superior choice both in terms of latency and bandwidth. As long as browsers don't support using QUIC Connection ID demultiplexing to run multiple WebTransport sessions on the same UDP port, this use case won't be covered by WebTransport.
Reusing the same browser-generated certificate has some subtle privacy risks. Since the WebRTC handshake uses DTLS 1.2 (the problem was fixed in 1.3), the Certificate messages are sent in the clear, meaning that an on-path observer can correlate connections from the same browser to different endpoints. This is a privacy risk for people taking steps to mask their IP information, e.g. by using a VPN. In any case, this spec should discuss this, and maybe even recommend using different certificates for connections to different IP addresses.

mxinden commented 1 year ago

Following up on an out-of-band discussion:

There were race conditions in the connection establishment process which have now been documented in comments. This was to do with setting peerconnection callbacks after the noise handshake. If one peer was faster than the other, it could initiate a stream before the remote had even set the datachannel callback.

@ckousik would you mind proposing a short paragraph to be included in the specification to prevent other implementers to introduce the same race condition?

mxinden commented 1 year ago

@marten-seemann as discussed out-of-band https://github.com/libp2p/specs/pull/412/commits/88036826703881e4c6cf2ae4088a207e3f9fef48 drafts the browser-to-browser section. As mentioned previously I suggest we do not block the browser-to-server specification on the browser-to-browser specification. Instead I suggest we tackle the latter in a second iteration. I added the draft in https://github.com/libp2p/specs/pull/412/commits/88036826703881e4c6cf2ae4088a207e3f9fef48 to still make sure we are on the right track for the the browser-to-browser specification.

BigLep commented 1 year ago

@mxinden : given this spec is already has a lot of comments for browser-to-server, would it maybe make sense to do browser-to-browser in another PR (on top of this one)? I'd like us to be able to close this one when we meet the requirements of having browser and server implementations that meet the browser-to-server portion of the spec.

mxinden commented 1 year ago

@BigLep yes, complete browser-to-browser specification will happen in a second iteration of this specification, i.e. another pull request.

While describing both the browser-to-server and browser-to-browser use-case, priority is on the former. Once the browser-to-server use-case is sufficiently specified, I will remove the browser-to-browser use-case, merge this pull request, and start iteration two, namely to sufficiently specify the browser-to-browser use-case.

Any reference to browser-to-browser in this specification is for:

Being tightly related to the browser-to-server use-case (e.g. security handshake)
Have been raised and should not be forgotten in the meantime (e.g. UDP port allocation in browsers)
Ensure that we design the browser-to-server use-case in a way that is compatible with the browser-to-browser use-case (e.g. reuse of /webrtc multiaddr component in both use-cases)

tomaka commented 1 year ago

I know I'm not the most diplomatic person when it comes to libp2p, but I'd like to once again complain a little bit.

We at Parity have a deadline for shipping WebRTC in production at the end of the year. In two and a half months. This is not an arbitrary self-enforced deadline that we can push back, we are victims of Google's Manifest v3 enforcement for browser extensions.

Having this in mind, I had initially created (in April!) a proof of concept and proposed a more simple design, that was apparently rejected (without really any good justification IMO, but that's besides the point here). For the benefit of the collaboration I was willing to give the benefit of the doubt to this specification instead.

This pull request has now been opened for 5 months, with no real end in sight.

The smoldot implementation for example has been ready for a long time now, only waiting for decisions to be made here so that it can align with the rust-libp2p implementation.

I personally don't think it matters at the slightest for example in which way the certificates are ordered in the Noise payload, however I am slightly frustrated that it takes months to resolve questions like these that, sorry to say, almost don't matter.

I want to remind everyone that whatever happens here, a spec written by committee will contain mistakes. You cannot write a spec first and expect it to be flawless, humans simply are not capable of that. This will need a version 2 of the WebRTC libp2p protocol no matter what.

With all that being said, can I suggest that we close all discussions, merge this as a known-to-be-flawed v1 soon (today?), then implement it, then gather back for a v2?

tomaka commented 1 year ago

merge this as a known-to-be-flawed v1 soon

If you don't agree with that idea, please let us know as soon as possible so that we can start finding a solution to our WebRTC problem instead of waiting more.

mxinden commented 1 year ago

I want to remind everyone that whatever happens here, a spec written by committee will contain mistakes. You cannot write a spec first and expect it to be flawless, humans simply are not capable of that. This will need a version 2 of the WebRTC libp2p protocol no matter what.

For the unknowns I have the same mindset. We have multiple mechanisms in place (multiaddr, Protobuf) that will allow us to evolve this protocol on live networks in the future.

For the knowns I do tend towards in-depth design instead of shipping things fast.

Talking about the knowns: The only outstanding item from my end is the datachannel ID discussion above. More specifically whether to use negotiated: true and thus control channel ID reuse and safe 0.5 RTT on each new channel or whether to use negotiated: false. See https://github.com/libp2p/specs/pull/412#discussion_r984746570. I would like to get @ckousik and @John-LittleBearLabs input here before making a decision.

Unless I am missing something, all remaining discussions (i.e. knowns) above are resolved / decided. For the record, these are also tracked in the pull request description.

With all that being said, can I suggest that we close all discussions, merge this as a known-to-be-flawed v1 soon (today?), then implement it, then gather back for a v2?

It is not going to happen today. That said, I do hope for this specification to go into final review by the end of the week.

ckousik commented 1 year ago

I want to remind everyone that whatever happens here, a spec written by committee will contain mistakes. You cannot write a spec first and expect it to be flawless, humans simply are not capable of that. This will need a version 2 of the WebRTC libp2p protocol no matter what.

For the unknowns I have the same mindset. We have multiple mechanisms in place (multiaddr, Protobuf) that will allow us to evolve this protocol on live networks in the future.

For the knowns I do tend towards in-depth design instead of shipping things fast.

Talking about the knowns: The only outstanding item from my end is the datachannel ID discussion above. More specifically whether to use negotiated: true and thus control channel ID reuse and safe 0.5 RTT on each new channel or whether to use negotiated: false. See #412 (comment). I would like to get @ckousik and @John-LittleBearLabs input here before making a decision.

Unless I am missing something, all remaining discussions (i.e. knowns) above are resolved / decided. For the record, these are also tracked in the pull request description.

With all that being said, can I suggest that we close all discussions, merge this as a known-to-be-flawed v1 soon (today?), then implement it, then gather back for a v2?

It is not going to happen today. That said, I do hope for this specification to go into final review by the end of the week.

@mxinden I've commented on the datachannel reuse. If there are no more discussions on this we can move this PR to final review.

mxinden commented 1 year ago

I am marking this as ready for review. I feel comfortable merging this specification as is. Thus I would like to ask for a (potentially final) round of reviews from folks involved.

Note that this specification covers the browser-to-server use-case only. The browser-to-browser use-case will be specified in a second iteration based on the revert commit of https://github.com/libp2p/specs/pull/412/commits/a46919c557aa4501afb285ffbec8261a7a7e595b.

@marten-seemann, @MarcoPolo and @julian88110 does this look good to you from the Golang side?

@achingbrain what do you think from the JS side?

@ckousik and @John-LittleBearLabs any objections from Little Bear Labs?

@Menduist See any blockers for the future nim-libp2p implementation of this specification?

@melekes given that you approved, I am assuming this looks good from Parity's side.

@thomaseizinger anything you would like to see changed given your expertise with the Rust side?

melekes commented 1 year ago

Perhaps we should also say that a=max-message-size:16384 SDP attribute should be set to the maximum frame size (16kB). By default, it's 64kB (see https://datatracker.ietf.org/doc/rfc8841/), but it might be a good idea to align these two variables. Wdyt? The alternative is to not include attribute in SDP and allow messages up to 64kB.

mxinden commented 1 year ago

Perhaps we should also say that a=max-message-size:16384 SDP attribute should be set to the maximum frame size (16kB). By default, it's 64kB (see https://datatracker.ietf.org/doc/rfc8841/), but it might be a good idea to align these two variables. Wdyt? The alternative is to not include attribute in SDP and allow messages up to 64kB.

That sounds good to me @melekes. @ckousik @John-LittleBearLabs any objections?

mxinden commented 1 year ago

@marten-seemann thank you for your review. I addressed all your comments.

I followed both of your major suggestions.

Inferring hash algorithm from multiaddr for easier role out of new hash algorithm. https://github.com/libp2p/specs/pull/412#discussion_r999745879
Prefixing ufrag to ease general future upgrade mechanism via ufrag. https://github.com/libp2p/specs/pull/412#discussion_r999768836

Would you mind taking another look?

ckousik commented 1 year ago

Perhaps we should also say that a=max-message-size:16384 SDP attribute should be set to the maximum frame size (16kB). By default, it's 64kB (see https://datatracker.ietf.org/doc/rfc8841/), but it might be a good idea to align these two variables. Wdyt? The alternative is to not include attribute in SDP and allow messages up to 64kB.

That sounds good to me @melekes. @ckousik @John-LittleBearLabs any objections?

No objections from my end.

mxinden commented 1 year ago

@marten-seemann thank you for your review. I addressed all your comments.

I followed both of your major suggestions.

1. Inferring hash algorithm from multiaddr for easier role out of new hash algorithm. [webrtc/: Add libp2p WebRTC browser-to-server spec #412 (comment)](https://github.com/libp2p/specs/pull/412#discussion_r999745879)

2. Prefixing ufrag to ease general future upgrade mechanism via ufrag. [webrtc/: Add libp2p WebRTC browser-to-server spec #412 (comment)](https://github.com/libp2p/specs/pull/412#discussion_r999768836)

Would you mind taking another look?

Friendly ping @marten-seemann. Does this look good to you?

tomaka commented 1 year ago

I understand that you're being over cautious with clicking the merge button, but can we now consider that this version of the spec is "final", in the sense that the logic isn't going to change before merging, and the only things that could change are possible clarifications of the language? I'd like to start releasing versions that "officially" support this spec.

mxinden commented 1 year ago

Quite a long journey. 475 (+1) comments and 135 commits, 14 reviewers, ...

I am merging here as a Working Draft. We can promote it once we have a released implementation.

A couple of shoutouts:

@ckousik and the Little Bear Labs folks providing all your WebRTC knowledge.
@melekes for all your input from the Rust side. Thanks for bearing with us throughout the process.
@Menduist and @s1fr0 for your input on the Noise handshake, especially proposing the Noise prologue mechanism!
@thomaseizinger filling the many gaps, thinking deep into the stream semantics introduced here.
@marten-seemann for the high-quality comments across the specification.
...

Up next: browser-to-browser use-case.

tomaka commented 1 year ago

Shoutout to me for having started the whole thing before it was aggressively taken over and not receiving a shoutout

marcus-pousette commented 1 year ago

I am working a project that heavily relies on the libp2p-js impl. I am currently working towards enabling a huge amount of dapps that benefit greatly from WebRTC. What I am missing (?) currently is a way for having js WebRTC listeners. Now it seems to only be supported/to-be-supported for the the Go and Rust impl. Correct me if I have understood the spec. incorrectly, but is this something that is planned for? Or not?

MarcoPolo commented 1 year ago

Follow #497

libp2p / specs

webrtc/: Add libp2p WebRTC browser-to-server spec #412

Remaining work items for browser-to-server: