private-octopus / picoquic

Minimal implementation of the QUIC protocol
MIT License
540 stars 159 forks source link

Support for peer-to-peer use? #519

Open xdaviddx opened 5 years ago

xdaviddx commented 5 years ago

The Readme.md states:

Then there are plenty of other features we may dream off, such as support for multipath, or support for peer-to-peer applications.

What would prevent this from being used in a peer-to-peer situation now?

huitema commented 5 years ago

For multipath, there is some work needed: validate multiple paths without invalidating the old path, tweak the packet scheduler to send packets on different paths, and tweak the congestion control algorithms to manage the load balancing. If someone is interested, push requests will be helpful!

For peer-to-peer, we need a form of name/credential management, support for NAT/firewall traversal functions, and support for a discovery protocol. But mostly we need a target application. If you have one in mind, I would be happy to help.

bkchr commented 5 years ago

Hey, I'm using picoquic since more than one year for a peer to peer application. Nothing special is required. I just exchange the adresses between both peers and let them start connecting to each other.

huitema commented 5 years ago

Yes, I am a bit of a maximalist. If you are willing to exchange addresses and if you are not behind a firewall, it will work just fine.

bkchr commented 5 years ago

The peers are behind a firewall. ^^ The connection is just doing a simple udp hole punching, by connecting to each other. Something like proxying a peer, with end to end encryption would be nice.

huitema commented 5 years ago

I have been thinking about that for some time. Suppose that the peer to peer application relies on a set of "super nodes", much like the original version of Skype. The super nodes run the proxy software. Each peer advertise the super node(s) through which it can be reached. The remote peer connects first through the super node. After that, they use QUIC migration support to migrate the traffic to a direct path.

The proxying service could run over QUIC. The proxy would receive QUIC connection attempts, inspect the packet to find the SNI (or ESNI). If the ESNI matches the name of the proxy, then it is a connection to the proxy. If it matches the name of a proxied peer, then it is a connection to that peer. The packet can be proxied using "QUIC in QUIC", maybe in a DATAGRAM frame. We will need to specify how this is encapsulated, maybe carrying address information along with the original packet in the DATAGRAM frame, and we will also need to take care of the decapsulation at the final destination. In short, some plumbing required, but that seems doable.

huitema commented 5 years ago

To manage the migration, we need something like ICE (RFC 5245). The peer needs to discover what address it can use on the other side of the NAT. That could be a function of the proxying service, "I see you as address: X, port: P". Or we could make that a QUIC extension frame, so any p2p application could inform its peers of their "reflective address". Then the peers could exchange "suggestions", such as "you can try to connect directly to me at X:P". QUIC already defines a very simple form of that with the "server preferred address", so maybe we can start there.

Of course, this gets much simpler if we can use IPv6.

huitema commented 5 years ago

The other P2P requirement is being able to connect behind the firewall, like for example 2 devices in the same house. IPv6 might get us there. ICE-style discovery is more problematic, since it will just propose an address like 192.168.0.5 that could be located anywhere. It might also create interesting privacy leaks. But we may be able to do some kind of multicast discovery with good privacy attributes, e.g. multicast a QUIC probe that contains an encrypted PATH CHALLENGE.

xdaviddx commented 5 years ago

Thanks for all the information.

For my use case, multipath isn't needed. For name/credential management, support for NAT/firewall traversal functions, and support for a discovery protocol, I just assumed I'd have to do those at the application level and/or use other libraries. I just wanted to know if the server part of the code worked fairly reliably or if it was a case of just using it to test the client and that the client was intended to be used with other server implementations.

The target application isn't open source. We were looking at QUIC implementations and liked that picoquic was fairly self contained. We would have ideally wanted a Java implementation, since that's what the application is build on, but couldn't find one. The intent would be to use SWIG or JavaCPP to make use of a C/C++ library. The application will be cross platform - Windows, MacOS, Linux, Android and iOS.

I see that OpenSSL is being used in the dependent TLS library. Any chance of supporting BoringSSL?

NAT is a very real issue that needs to be solved for in this application. As Bastian has done, our plan is UDP hole punching. We plan on running a public facing STUN server for each peer to acquire their IP:Port of the public side of their respective NAT. There will be a public server available for signaling as well (exchanging those IP:Port values, peer discovery, authentication, possibly credential assistance, as well as some regular application functionality).

Intra-peer services (peers are comprised of multiple processes) and peer-to-cloud services will be implemented with gRPC over HTTP/2. And if Google would add in support for gRPC on QUIC, that would be wonderful. In the meantime, we'll have to implement two types of APIs for the two transport mechanisms.

It is very interesting that you brought up the super nodes concept. That's something we were planning in order to reduce resource usage on the cloud server. It isn't clear to me how the QUIC migration feature would be used to transition from a relayed connection to a direct connection. Are you thinking that picoqiuc would be doing the hole punching in the background and then move the QUIC relay connection over to the direct path, once a hole has been opened?

If the ESNI matches the name of the proxy, then it is a connection to the proxy. What would it be connecting directly to the proxy for. Do you mean using it for signaling as I mentioned above, for NAT hole punching?

When you say QUIC in QUIC, do you mean a sort of QUIC tunnel, so the TLS of the peer-to-peer "connection" isn't terminated until it gets to the other peer, but each peer would have an outer QUIC connection terminating at the proxy? That's an interesting idea. We just planned on using gRPC services (or messaging over QUIC if the proxy/super node was also behind NAT), but then needing data level encryption. If you were able to do it at the protocol level, that would be slick. You could call it "SLIC" (Slick Leap Internet Connections? Hmm...can't think of a better L word). Or QUICochet. :-)

I just saw your 2 more recent posts. Regarding sharing each other's public IP:Port, it wouldn't need to be encrypted when telling a peer its own IP:Port, but I guess if we're using QUIC, then it would be. The only value I can see in this encryption is not really the encryption, per se, but the trust around certificates (more on that below).

In any regard, telling the opposite side what the other side's IP:Port is should be encrypted so that someone listening on the wire can't build a database of who is talking to whom. If there are only 2 peers connected to one super node, then I guess it would be kind of obvious. And in the case of a busy super node, someone could still possibly figure it out by looking at the size of the packets going in and the size going out and the start and stop times. That's probably going to be true of any type of relay without including some sort of super node random delays and/or adding some dummy data to make them asymmetric, at the expense of performance.

My other concern with super nodes is someone setting up a rogue node. In our case, our nodes and peers can talk to the mother ship to confirm the other other end is one of the good guys. In a more decentralized system, that mother ship doesn't exist. You could have peers that aren't using your application software eating up your node resources with their own peer to peer data transfers. Or you could have rogue super nodes trying to glean information. As long as the true payload is encrypted at the application layer or there is double encryption at the protocol layer, they shouldn't be able to get anything except metadata. They could purposely slow down transfers or drop connections or scramble data in an effort to cause problems for the application owner and users.

Now, you may be thinking of these super nodes as machines controlled by the application owner, with regular domain names and regular CA certificates based on those domain names. I am thinking of them as willing user participants, hence no CA certificate.

You mentioned SNI. How do you intend to name the proxies and peers?

I didn't think LAN discovery would be a problem. I assumed one or more of the many discovery protocols would be used and everything would be self-contained inside the LAN. In the case of bad home routers that aren't passing ARP or other things (ARP is a constant battle on my home network with my ASUS router/WAP), I figured a fallback could be to go back to the public signaling server and exchange LAN IP:Port information there. If picoquic will be a swiss army knife of peer-to-peer accessories, that would certainly simplify things.

Encrypted PATH CHALLENGE?

One other thing about the super node idea for relaying data or even the proxy idea. In either of those cases, the picoquic server code will need to be able to scale well. If QUIC is only used for the p2p connection and other methods are used for signaling and relaying, then those other things can be chosen based on their ability to scale, and the pressure is off the QUIC implementation, since it only has to deal with a small number of concurrent peers.

One more question. My understanding is that there were some problems along the way with the QUIC standard regarding multiplexing other UDP protocol headers. I believe STUN was accounted for, but they decided to stomp on TURN, because WebRTC doesn't typically use TURN servers, and I guess they only cared about what WebRTC needed??? :-/ Anyway, does your implementation co-exist with STUN, if used on the same port?

Thanks again!

huitema commented 5 years ago

Picoquic does not have support for STUN at this point. That could be added. QUIC is specifically designed so that STUN and QUIC packets can be easily demultiplexed. I know these services quite well, I was one of the authors of the original STUN RFC.

QUIC can easily use a STUN server, but TURN is specifically designed to prevent "running a server behind a firewall". A standard TURN server will only accept traffic from a remote peer if the local peer explicitly authorized that traffic in advance. The planned revision of TURN is even more restrictive. Authorizations are only valid for a short time, and have to be renewed frequently. The TURN server is supposed to verify that the relayed traffic looks like RTP. All that is designed so that enterprises can deploy a TURN server as a media relay and trust that it will only be used by the video conference service. That's why I don't believe that we can use TURN for letting a peer listen to incoming communication from other peers; I am tempted to look at an alternative based on QUIC itself. Of course, with QUIC, all the exchanges between proxy and client would be encrypted.

The TLS stack needs an SNI, that's how it finds out which certificate to use in TLS. By default, the SNI should match(one of) the subjectAltName extension of type dNSName of the server certificate. If peers are identified by a certificate, we should probably use that, but we should also use SNI encryption for privacy reasons. If we have that, then a proxy waiting for connections on behalf of multiple QUIC nodes could use the SNI (or ESNI) to find out where to proxy the connection.

Of course, each application or service will decide what kind of proxy it wants to deploy -- paid for as part of the service, paid by the client, or volunteered by a super node.

And yes, QUIC is designed to support connection migration. The first application is clients moving from Wi-Fi to 3G, but the same mechanisms could be used to move from a proxy based connection to a direct connection. It would require hole punching, which QUIC does by sending connectivity verification packets; the required extension for P2P are to learn candidate addresses for the peer, and to coordinate the verification from both sides. That's why peers need to tell each other what address they want to use, and that would be an extension to the basic QUIC protocol.

Apart from the NAT/Firewall traversal, I think Picoquic will support an RPC style application just fine. In fact, the mapping of RPC to QUIC streams could allow parallel transmission and processing of multiple transactions, more efficient than TCP.

huitema commented 5 years ago

About OpenSSL and BoringSSL. Picoquic depends on Picotls, which has an optional dependency on OpenSSL -- mostly needed for some crypto algorithms that are not available in the base Picotls libraries. It would take a bit of effort to make the OpenSSL dependency optional in Picoquic as well, but that would be easy if your application is happy using ECDSA certificates.

xdaviddx commented 5 years ago

Sounds like picoquic is in good hands to be STUN compliant when the time comes. :-)

It has been a while since I looked at the TURN details. I'm not understanding how one of the 2 peers verifying that the other guy is a good guy prevents 2 peers from using a node that neither should use (because someone else is paying for it)?

I understand the SNI concept to determine if something is trying to connect to the proxy itself or to use the proxy to relay to another peer. What I'm struggling with is how this works when IP addresses are being used instead of domain names. If domain names are used, the peer trying is trying to connect to a domain name that matches the CN or one of the subjectaltname doman names. If IP addresses are used, then the IP would be in the CN field and I'm not seeing how subjectaltname can be used. Can a client somehow send in a different IP during TLS negotiation than the one used to connect to the server? I'm not seeing how it is feasible to have domain names for all peers, or even volunteer super nodes, hence the need to use IP addresses.

Is the QUIC migration from proxy to p2p just to reduce latency for the initial communication between the peers? Or is it so they all follow the same process and all connect to the proxy first and those that can connect directly end up doing so, whereas the others stay with the proxy relay?

Yes, it would be nice if google would support QUIC for gRPC. They apparently use it in some of their own applications, but they haven't made it available for public use yet.

I was curious about BoringSSL, just because some groups seem to be moving away from OpenSSL or not wanting to put all their eggs in one basket. I believe elliptic curve should be fine. The hope is to use something similar to the double ratchet algorithm for forward secrecy. Elliptic curve is one of the mechanisms used for that. What makes it easier to use BoringSSL with Picotls if EC is used instead of RSA?

Thanks again.

huitema commented 5 years ago

I was not saying that using Boring SSL was easier if we use EC. What I was saying is that if we were to just use EC, then we would not need to use OpenSSL at all. We would just use the compact TLS stack provided by Picotls.

You cannot authenticate addresses with TLS. If you want the TLS stack perform any kind of authentication, you need to provide the name of the server that you expect to reach. And the server has to know which name you expect to connect to. Hence the SNI.

xdaviddx commented 5 years ago

Ah, I understand now about EC and the smaller library for crypto with Picotls. Makes sense.

You should be able to authenticate an IP only certificate with a SAN containing an IP address. (meaning TLS will proceed and encrypt) But it doesn't seem like there is a way to make SNI work with IP-only systems. I would suggest another method be used to signify whether a client wants to connect to the proxy server or connect to another peer through the proxy server. This will allow for peers and proxies to use IP-only certificates. Requiring all peers to have registered domain names would seem to be a very limited use case.

In my particular use case, a master server, that the proxy and the peers talk to, will be the one verifying if the other end is who it says it is. The peers and the proxy/super node will provide credentials to the master server (have a regular username/password account on the server).

Perhaps in picoquic, an option could be left open to not require the end user of the library to use whatever authentication the library may end up bundling, such that applications can choose their own out of band mechanisms. If picoquic (or picotls probably) is only checking that the URI matches the certificate values and that is the level of authentication, then that works too. Those applications that use domain names will have a reasonable assurance of reaching a proper proxy. Those applications that use IP addresses can use other methods at the application level.

Having routing separated from authentication will also allow for systems to be set up that, for whatever reason, don't care about the authentication aspects, but still need to signify the route they desire and still need TLS level encryption.

It seems like there are a number of situations we've discussed where there will be a need for out of band signalling messages that the library uses, for the special P2P add-ons, that will be happening under the covers without the application receiving those messages (the library will strip them off and not pass them up to the application).

Either:

A) The same port and connection ID would be used (with something else in the each packet signifying that they are special).

OR

B) The same port and another connection ID would be used (not sure if QUIC allows for this)

OR

C) Another port and connection ID would be used. This option couldn't be used for STUN, however, as the port needs to be the same as the port the application data flows through.

(B) seems the most logical to me, but I'm not sure if QUIC allows for multiple connection IDs for the same IP:Port combination. (A) seems like the next likely option, albeit requiring more work to implement. I don't think (C) is a viable option.

Thoughts?

I think it is great that you want to include additional P2P features in the library. There seems to be an assumed view of QUIC as being a client/server protocol (where the server is in the cloud). Being based on UDP, P2P seems like an obvious extra use for it, since UDP is better able to get through residential NAT devices.

huitema commented 5 years ago

@xdaviddx It am looking at the "masque" extensions to HTTP3 that David Schinazi specifies in https://tools.ietf.org/html/draft-schinazi-masque-00. They specify a Quic-in-Quic proxy, and also provide STUN functionality. Would that help your scenario?

fluffy commented 4 years ago

Just one note on ICE - it turned out to be basically impossible to debug on productions networks. There are much better ways to skin the P2P cat than ICE so if we go down this path - I would avoid slipping on ICE again.

huitema commented 2 years ago

Looking at this old issue. @fluffy, the scenarios that I see is the following:

1) Peer to peer application uses a set of "super-nodes" to facilitate connections. Maybe VMs in the cloud, maybe nodes that 2) Any node can set up a QUIC connection to a server running on a super-node. 3) Super-nodes provide service like MASQUE so simple nodes can get a "tunnel" through their NAT. 4) Any node can run a server that will accept packets from the outside, and also through a tunnel. 5) Nodes that are not super-node publish the "tunnel" address(es) at which their server can be accessed. 6) Initial connection goes through the tunnel 7) Some negotiation is used to see whether NAT traversal would work, whether the nodes are behind the same NAT, etc. 8) If the path is available, QUIC migration or QUIC multipath is used to move traffic to that path and unload traffic from the super-node.

Sounds like step 7 could use "something like ICE", with control messages using the tunnel through the super-node. But yes, reliability of ICE might be an issue. I am aware for example that trying to "spray the NAT from both sides" could very well trigger bugs and cause some NATs to exhaust their mapping table and crash. I wonder what the reliable alternative could be. Maybe concentrate on IPv6?