hashicorp / yamux

Golang connection multiplexing library
Mozilla Public License 2.0
2.19k stars 232 forks source link

Implement timeouts for OpenStream #96

Closed freddygv closed 3 years ago

freddygv commented 3 years ago

Currently streams can be in the streamSYNSent state for an unbounded amount of time when the remote is unable to reply and has not closed the connection via FIN/RST.

This commit adds a deadline to receiving an ACK on calls to OpenStream.

The timeout is configurable per-session with StreamOpenTimeout, and can be disabled by setting it to zero. When the timeout is triggered, an error is logged.

Note that the establishCh is buffered and we avoid closing the channel because there are multiple call-paths that can invoke it:

banks commented 3 years ago

The timeout is opt-in only and configurable per-session with StreamOpenTimeout.

I'd maybe lean towards setting it to a reasonable default and make it opt-out. This is slightly less risk averse but still gives a path in the unlikely event this causes problems. I can't really think though of cases where OpenStream is blocking for more than say 30 seconds where just stalling indefinitely is actually good behaviour. It is possible that there are cases where it's not harmful and this change will cause more session closes which might introduce more visible errors in other in-flight streams, but I don't think it's especially dangerous and having a default configuration that avoids this failure mode seems better to me.

What do you think?

freddygv commented 3 years ago

@banks Yeah that was my initial thought by adding a default of 30s. I think that setting a reasonable default and allowing it to be disabled seems conservative enough.

I did some sleuthing to see what defaults exist for this similar settings in the wild and am seeing 75s on both FreeBSD and Mac OS.

See TCP_KEEPINIT here: https://www.freebsd.org/cgi/man.cgi?query=tcp&apropos=0&sektion=4&manpath=FreeBSD+9.2-RELEASE&arch=default&format=html

How does 75s sound for the default timeout? For Consul's use case we could configure something lower explicitly