libp2p / specs

Technical specifications for the libp2p networking stack
https://libp2p.io
1.56k stars 273 forks source link

relay/dcutr/quic: Alternate host sending garbage UDP packet #487

Open mxinden opened 1 year ago

mxinden commented 1 year ago

Today during direct connection upgrade on QUIC, A sends a client hello, B sends random bytes. The client hello makes it through B's firewall and/or NAT through the hole punched by the random bytes sent by B. B's consecutive server hello makes it through A's firewall and/or NAT through the hole punched by A's client hello.

  • For a QUIC address:
    • Upon receiving the Sync, A immediately dials the address to B.
    • Upon expiry of the timer, B starts to send UDP packets filled with random bytes to A's address. Packets should be sent repeatedly in random intervals between 10 and 200 ms.
    • This will result in a QUIC connection where A is the client and B is the server.

https://github.com/libp2p/specs/blob/master/relay/DCUtR.md#the-protocol

Now assume that B is behind a symmetric NAT but A is not. A's client hello will not make it through B's NAT, given that it (most likely) does not have the same destination port as the translated source port of B's random bytes.

If we would alternate the roles on retries in DCUtR, e.g. have B be the one to send random bytes in round 1 and A be the one to send random bytes in round 2, the above scenario would succeed on the second try, given that B is not behind a symmetric NAT.

Note that we could as well have both A and B send client hellos in the first round. The downside is, that contrary with TCP and simultaneous open, we might end up with two QUIC connections. One from A to B and one from B to A.

//CC @elenaf9 and @dennis-tra as discussed today.

marten-seemann commented 1 year ago

Now assume that B is behind a symmetric NAT but B is not. A's client hello will not make it through B's NAT, given that it (most likely) does not have the same destination port as the translated source port of B's random bytes.

If we would alternate the roles on retries in DCUtR, e.g. have B be the one to send random bytes in round 1 and A be the one to send random bytes in round 2, the above scenario would succeed on the second try, given that B is not behind a symmetric NAT.

I'm not sure I understand how this is supposed to work. First of all, there are no rounds. A and B send their packets at the same time. In both scenarios, A and B send UDP packets to one specific destination address, respectively. Why does the payload of the UDP packets (random bytes or ClientHello) matter?

mxinden commented 1 year ago

First of all, there are no rounds. A and B send their packets at the same time.

A and B send their packets at the same time. On failure B retries the DCUtR flow.

On failure of all connection attempts go back to step (1). Inbound peers (here B) SHOULD retry twice (thus a total of 3 attempts) before considering the upgrade as failed.

https://github.com/libp2p/specs/blob/master/relay/DCUtR.md#the-protocol

In both scenarios, A and B send UDP packets to one specific destination address, respectively. Why does the payload of the UDP packets (random bytes or ClientHello) matter?

Say that B is behind a symmetric NAT and A isn't. A sends a client hello to B which is dropped at B's NAT given that A does not know B's NATed port. B's random bytes make it through A's NAT through the hole punched by A's client hello. A discards B's random bytes. End result is no connection.

Say that B is still behind a symmetric NAT and A isn't, BUT B sends a client hello to A and A sends random bytes to B. B's client hello will make it through A's NAT through the hole punched by A's random bytes. A's random bytes are dropped at B's NAT given that A does not know B's NATed port. The latter doesn't matter. A received B's client hello and responds with a server hello. End result is an established connection.

By alternating on retry who sends the random bytes, we would succeed on the second try.

Does the above make sense @marten-seemann?

marten-seemann commented 1 year ago

That makes sense, thank you for the clarification!

Is the proposal to reduce the number retries to 2 (given that @dennis-tra's measurements show that there's no point in trying more than once), and alternate the roles after the first attempt?

mxinden commented 1 year ago

I am not yet sure what the best strategy would be. Alternating across (re-) tries was the first that came to my mind.

Unfortunately, having both endpoints send client hellos from the start, might result in two connections. If that wouldn't be the case, this would be my favorite strategy, given that it is the fastest.

Is the proposal to reduce the number retries to 2

In my eyes, that is an orthogonal change.

MarcoPolo commented 1 year ago

Now assume that B is behind a symmetric NAT but B is not.

Typo? A is not?

mxinden commented 1 year ago

Now assume that B is behind a symmetric NAT but B is not.

Typo? A is not?

Thanks for the catch @MarcoPolo. Fixed.

dennis-tra commented 1 year ago

As you said Marten, the measurement results suggest that if it doesn't work with the first attempt it likely won't work with any subsequent one.

So, I think the optimizations here would be to either

  1. decrease the number of attempts or
  2. change the strategy for subsequent attempts.

What Max suggests here is 2. - to change something in the way that we try to hole punch in the second attempt. Which I also find the better option.

Switching roles of client/server makes sense to me for the reasons that Max explained. However, I have something to consider: If B is behind a symmetric NAT I'd assume that B won't be able to determine its OwnObservedAddrs because the identify protocol would report inconsistent address/port combinations. This would (at least in the current implementation) prohibit a hole punch.

vyzo commented 1 year ago

My initial measurements had suggested that some conns do go through in the second retry.

A more conservative approach is to do 2 retries, and then try switching the hello to punch through cone-symmetric scenarios, with 1 retry.

dennis-tra commented 1 year ago

@vyzo This is the data that we are referring to: https://www.notion.so/pl-strflt/NAT-Hole-punching-Success-Rate-2022-09-29-Data-Analysis-8e72705ca3cc49ab983bc5e8792e3e98#c76c6d5e25844bff8c7508b67f236827

This suggests that if we were not successful with the first attempt, there's only a ~3% chance that it'll work with a subsequent attempt.

vyzo commented 1 year ago

3% is not negligible, please be more conservative in your assessments!

dennis-tra commented 1 year ago

3% is indeed not negligible. My assumption is that the proposal we are discussing here wouldn't have a negative effect on the 3% for whom it worked with the second attempt but just increase the chances for the ones that weren't lucky with any subsequent attempt. Curious about what the others think.

marten-seemann commented 1 year ago

However, I have something to consider: If B is behind a symmetric NAT I'd assume that B won't be able to determine its OwnObservedAddrs because the identify protocol would report inconsistent address/port combinations. This would (at least in the current implementation) prohibit a hole punch.

I think that’s correct. We also have some logic there to determine the NAT type there, maybe there’s some way to make use of that information?

  1. change the strategy for subsequent attempts.

We need to be careful how we do this in a backwards-compatible way. Legacy nodes will still want to punch multiple times without switching roles. Maybe that’s fine, but maybe we can find some clever way around that.

sukunrt commented 11 months ago

@mxinden:

Say that B is behind a symmetric NAT and A isn't. A sends a client hello to B which is dropped at B's NAT given that A does not know B's NATed port. B's random bytes make it through A's NAT through the hole punched by A's client hello. A discards B's random bytes. End result is no connection.

Say that B is still behind a symmetric NAT and A isn't, BUT B sends a client hello to A and A sends random bytes to B. B's client hello will make it through A's NAT through the hole punched by A's random bytes. A's random bytes are dropped at B's NAT given that A does not know B's NATed port. The latter doesn't matter. A received B's client hello and responds with a server hello. End result is an established connection.

This doesn't work. For A to holepunch through its firewall it needs to send a packet to B's symmetric NATed address. This is not what happens. A sends a packet to a port on B which is not the port that B sends packets out of.

Consider the case:

B tells A its port is Y but the port it'll actually send packets out of is X. A tells B its port is P and it'll send packets out of P

A: P -> Y Now A's firewall will allow incoming packets from Y but not from X, so when B does B: X -> P this will be dropped by A's firewall.

This packet from B: X->P can only work if A's firewall allows all incoming packets from B's IP address irrespective of the port. This firewall behaviour is probably in the minority because AutoNAT heavily relies on this behaviour and the metrics on bootstrappers do show them replying to a lot of nodes as private.

sukunrt commented 7 months ago

I think there is a way to make this work in case the firewall on the asymmetric(nice) side is permissive.

Let's assume the previous case: B(Symmetric NAT) tells A its port is Y but the port it'll actually send packets out of is X. A(ASymmetric NAT) tells B its port is P and it'll send packets out of P

A: P -> Y If A's firewall is permissive somehow B: X -> P this will be allowed by A's firewall.

If B sends a non quic packet, A can read this packet and get B's outgoing port Y. Now A can dial B at port Y.

Having said that, I don't know what this means If A's firewall is permissive somehow I think A behind such a firewall will just be a public node.