Consider dropping multistream-select sim open

mxinden commented 1 year ago

By default go-libp2p supports the multistream-select simultaneous open extension:

In order to support direct connections through NATs with hole punching, we need to account for simultaneous open. In such cases, there is no single initiator and responder, but instead both peers act as initiators. This breaks protocol negotiation in multistream-select, which assumes a single initator.

This draft proposes a simple extension to the multistream protocol negotiation in order to select a single initator when both peers are acting as such.

https://github.com/libp2p/specs/blob/master/connections/simopen.md

See also go-multistream/client.go as the entrypoint.

The simultaneous open extension is needed for uncoordinated simultaneous connection attempts on TCP. Note that coordinated simultaneous connection attempts in libp2p's hole punching does not require the extension. In that case the role (initiator, responder) is determined through the DCUtR protocol. See DCUtR specification.

There is a downside to supporting the multistream-select simultaneous open extension, namely when negotiating a single security protocol on a new normal TCP connection. "Normal" as in not a simultaneous connect, i.e. initiated only from one side.

Theoretically, when negotiating a single protocol only, one can use multistream-select's optimistic protocol negotiation. More specifically the dialer can propose the single protocol and then directly proceed, without waiting for confirmation from the remote, saving one round-trip in the happy path.

Though unfortunately, one can not combine the simultaneous open extension with optimistic protocol negotiation, as the former needs to wait for the response from the remote to determine whether to enter the initiator selection phase.

multistream-select with simultaneous open extension on "normal" TCP connection:

sequenceDiagram
    A ->> B: /libp2p/simultaneous-connect
    Note left of A: A can not proceed optimistically and negotiate the security protocol.<br/>B might enter _initiator selection phase_.<br/>Need to wait for B's response.
    B ->> A: `/libp2p/simultaneous-connect` or `na`

Dropping multistream-select sim open would allow go-libp2p to compete with tcp+tls+http in connection-establishment latency when supporting a single security protocol only. More specifically it would allow go-libp2p to establish a TCP+TLS connection in two round trips. At the expense of failed connection establishment on uncoordinated TCP simultaneous open.

Worth noting:

Most go-libp2p deployments support two security protocols on top of TCP, namely Noise and TLS. The proposed optimization does not apply here, as the multistream-select optimistic protocol negotiation can only be used when negotiating a single protocol.
There are measurements on the frequency of uncoordinated simultaneous connect in the wild. See https://github.com/libp2p/test-plans/pull/163/#discussion_r1206303871.
See https://github.com/libp2p/test-plans/pull/163/#discussion_r1203549587 for a recent discussion on this topic.
See https://github.com/libp2p/specs/issues/389 suggesting to not always do TCP port reuse and thus prevent uncoordinated simultaneous connects in the first place.
One could argue that go-libp2p's QUIC is the better TCP+XXX in the first place. Thus there is not much value in investing time here.

marten-seemann commented 1 year ago

See Consider only reusing TCP port when hole punching specs#389 suggesting to not always do TCP port reuse and thus prevent uncoordinated simultaneous connects in the first place.

As @vyzo pointed out, this breaks NAT type detection. This also breaks general address detection. Commented on the issue.

There are measurements on the frequency of uncoordinated simultaneous connect in the wild. See feat(perf): add (provision, build, run) tooling test-plans#163.

I can't find any measurements. Also surprised that they would on this PR, a coordinated test setup can't measure what's happening in the wild. Wrong link?

One could argue that go-libp2p's QUIC is the better TCP+XXX in the first place. Thus there is not much value in investing time here.

Agreed. We should have some measurements, and if they show that it's a rare event, we should drop it. @vyzo, you had some numbers, didn't you?

vyzo commented 1 year ago

There is a scenario where it happens a lot: unit tests!

marten-seemann commented 1 year ago

That's fine, we have smart dialing logic now, which will prefer QUIC ;)

Any scenario in the wild where it's critical?

vyzo commented 1 year ago

Critical? No. But it would be nice to support this dark corner of the TCP spec. Can we at least get some measurements before we yank?

marten-seemann commented 1 year ago

That’s what I’d like to do as well. The measurement should include a reconnect even triggered by a sudden network failure (e.g. killing all TCP connections using iptables for 30 (?) seconds or so).

@mxinden Could you instrument a Kubo node and run this experiment? Reporting can be as simple as putting a print statement in the sim open code path.

marten-seemann commented 1 year ago

@mxinden Any progress?

mxinden commented 1 year ago

@mxinden Any progress?

No progress.

Currently prioritizing this below everything else on my todo list most notably https://github.com/libp2p/test-plans/issues/63, https://github.com/libp2p/rust-libp2p/pull/4053 and https://github.com/libp2p/rust-libp2p/issues/2883. I still think QUIC is the better TCP+XXX. With the majority of IPFS traffic using QUIC already today, fixing this does not have a high impact. Am I missing something?

marten-seemann commented 1 year ago

With that kind of argument, we'll never get rid of multistream-select sim-open. However, this seems like a very easy win. It will cost us no more than a few hours (at most), and remove a fair bit of complexity in multistream.

libp2p / go-libp2p

Consider dropping multistream-select sim open #2330