tendermint / tmkms

Key Management service for Tendermint Validator nodes
Apache License 2.0
142 stars 42 forks source link

Experiment with adding timeouts to tpc socket #356

Closed zmanian closed 5 years ago

zmanian commented 5 years ago

@mdyring ran into the KMS just silently stalling. #352

I briefly looked at what switching to async_std would look like but it was pretty invasive.

@tarcieri thoughts on just adding a timeout on the socket like this?

Depending on what you thinking, I can pipe through a config value for setting this.

tarcieri commented 5 years ago

@zmanian these timeouts generally don’t work. They naively seem like a good solution but are triggered by events in the TCP state machine and therefore only work when the network is reliable and fail when it isn’t working. A proper timeout solution needs to be tied into the underlying I/O multiplexing abstraction and anything less is extremely brittle.

mdyring commented 5 years ago

Come to think of it, is TCP keepalive enabled on the socket? I suspect it is not - since otherwise in my experience a blocking read would fail eventually.

Re. timeout on blocking read/write I haven't experienced them not working (which I think is what @tarcieri is hinting at), but AFAIR it's always been with TCP keepalive enabled as well.