informalsystems / tendermint-rs

Client libraries for Tendermint/CometBFT in Rust!
Apache License 2.0
615 stars 224 forks source link

p2p: failure to read complete message #1392

Open tony-iqlusion opened 9 months ago

tony-iqlusion commented 9 months ago

What went wrong?

There are ongoing reports of failure to read a complete message in the tendermint-p2p crate which seem to be manifesting as an incomplete Protobuf message in TMKMS:

https://github.com/iqlusioninc/tmkms/issues/729

Steps to reproduce

Unfortunately this seems very chain-specific and I have not been given proper repro steps myself in the aforementioned issue.

Definition of "done"

It's unclear what actually needs to change here.

A somewhat related issue is the API that tendermint-p2p exposes: it's a message-oriented protocol, but the interface uses the std::io::{Read, Write} traits, which are stream-oriented. It's possible TMKMS is misusing the API, but it's the wrong abstraction to begin with.

tony-iqlusion commented 9 months ago

This might be related to #1356, although curiously people claim things work with Unix domain sockets and the errors only occur over TCP.

I did confirm read_exact is used: https://github.com/informalsystems/tendermint-rs/blob/64e9b33/p2p/src/secret_connection.rs#L622

tony-iqlusion commented 9 months ago

We're continuing to receive reports of this problem even with tendermint-p2p v0.34.1: https://github.com/iqlusioninc/tmkms/issues/729#issuecomment-1979571645

tony-iqlusion commented 5 months ago

There's now a PR to tmkms to have it aggregate chunks of data itself and keep trying to decode them as a proto: https://github.com/iqlusioninc/tmkms/pull/903

...and it really seems like tendermint-p2p should probably be doing that. Really I think the API should be refactored to be message-oriented, rather than stream-oriented.