libp2p / go-libp2p-circuit

Circuit Switching for libp2p
MIT License
48 stars 16 forks source link

call Stream.Reset instead of Stream.Close #76

Closed Stebalien closed 5 years ago

Stebalien commented 5 years ago

May fix https://github.com/ipfs/go-ipfs/issues/6237

Basically,

  1. We hang while closing a stream (because Close waits). Also, https://github.com/libp2p/go-mplex/issues/9
  2. This blocks the connection manager because it assumes that close doesn't wait.

This may also fix a stream leak.

vyzo commented 5 years ago

tests fail...

Stebalien commented 5 years ago

Yeah, we expect a nice EOF.

vyzo commented 5 years ago

we need the Shutdown call already.

Stebalien commented 5 years ago
  1. We need Close to mean "close both ways" (with a sane default timeout), Reset to mean "throw everything away", and then yeah, CloseWrite and CloseRead.
vyzo commented 5 years ago

The test failure seems insurmountable, we'll need to change the stream API for this to work :(

Stebalien commented 5 years ago

Not necessarily. The test failure is due to the assumption that conn.Close() flushes. However, conn.Close() actually resets in our TCP transport as well (for efficiency) so this isn't really an issue.

Stebalien commented 5 years ago

But I agree this sucks.

Stebalien commented 5 years ago

@vyzo, this is getting really critical and both the TCP and the QUIC transports reset the underlying connection on close without flushing anything (I specifically modified the TCP transport to do this to avoid unnecessary work). This bug likely just crashed our gateways and has been causing problems for a ton of people.

The only other solution I can think of is to:

  1. Close the stream. We'll have to fix any bugs we have where closing streams doesn't necessarily interrupt any in-progress writes (an issue for both yamux and mplex).
  2. Set a read deadline on the stream. https://github.com/whyrusleeping/yamux/pull/28 fixes read deadlines for yamux but they're still broken in mplex.
  3. Wait for the current reader to exit (read lock).
  4. Read on the stream waiting for an EOF or a timeout.
  5. On timeout, reset.
  6. Do all this in a goroutine. so we don't stall anything.

But that requires fixing multiple annoying bugs. I've started on this path (with the yamux fix) but it feels like a waste given that we don't actually make any guarantees about Close flushing.

vyzo commented 5 years ago

Sigh... If it has been identified as the critical bug, then we got to merge it.

Stebalien commented 5 years ago

Sigh... If it has been identified as the critical bug, then we got to merge it.

Well, Close is stalling. I'll try shipping a patch to some of our partners to make sure this is the bug.

vyzo commented 5 years ago

This needs a rebase for the go mod debacle (go.sum conflicts)

vyzo commented 5 years ago

rebased and resolved the conflict; should be ready for merge.

vyzo commented 5 years ago

we also need the companion in #77, so that the connection manager doesn't kill the underlying hop relay connections when they have stop streams.