iqlusioninc / tmkms

Tendermint KMS: Key Management System for Tendermint Validators
https://tendermint.com/
Apache License 2.0
325 stars 117 forks source link

Protobuf: buffer underflow #729

Open activenodes opened 1 year ago

activenodes commented 1 year ago

Chains with:

Every ~40 sigratures (softsign due to low time blocks) connection go down with this error:

Let me know if you need any more details.

Thanks @tony-iqlusion

IbrarMakaveli commented 5 months ago

Thanks for the feedback, I was thinking that I'm not the only one with this problem, how do you use unix socket on tmkms ?

qezz commented 5 months ago

how do you use unix socket on tmkms?

we use it for local connection only, so tmkms should be running on the same host (n.b. there are cons to this approach, as you may guess).

so for the cosmos chain

# config.toml
[priv-validator]
laddr = "unix://path/to/somewhere/kms.sock"

and in tmkms config you set the same address

[[validator]]
addr = "unix://path/to/somewhere/kms.sock"

(it's pretty late for me, so I hope I copypasted the correct thing)

anyway will look into it shortly (on Wed or Thu) and try to make it work

UPD: IIRC, the cosmos chain will create the socket, so tmkms will try connect to it. I would also recommend cleaning up (i.e. removing) the socket on every (re)start

tony-iqlusion commented 5 months ago

I can reopen this issue, however really issues should be filed against tendermint-p2p:, similar to this one: https://github.com/informalsystems/tendermint-rs/issues/1356

IbrarMakaveli commented 4 months ago

Do you have any news on this, if I can help with anything, unfortunately we're using tmkms on a remote machine, we can't make a Unix Socket connection, in the short term isn't there a workaround ? Thanks

tony-iqlusion commented 4 months ago

@IbrarMakaveli I filed https://github.com/informalsystems/tendermint-rs/issues/1392 to request upstream help debugging this problem.

What would be extremely helpful here is if someone could add reproduction instructions to that issue, especially if the issue is reproducible directly via the tendermint-p2p crate without involving TMKMS (or isolating TMKMS as the problem)

tony-iqlusion commented 4 months ago

Can someone attempt to reproduce this on a fresh install, which should use tendermint-p2p v0.34.1?

qezz commented 4 months ago

Rolled out 0.14.0-pre.1 to our Sei testnet validator, will let you know

Using tcp connection within the same host

qezz commented 4 months ago

Unfortunately the same error

image

Several signed blocks, then an underflow error

2024-03-05T20:18:43.118481Z DEBUG tmkms::session: [atlantic-2@tcp://...:51759] received request: ShowPublicKey
2024-03-05T20:18:43.118506Z DEBUG tmkms::session: [atlantic-2@tcp://...:51759] sending response: PublicKey(PubKeyResponse { pub_key: Some(PublicKey { sum: Some(Ed25519([162>
2024-03-05T20:18:43.314290Z ERROR tmkms::client: [atlantic-2@tcp://...:51759] protocol error: malformed message packet: failed to decode Protobuf message: buffer underflow
2024-03-05T20:18:44.314384Z DEBUG tmkms::session: [atlantic-2@tcp://...:51759] connecting to validator...
2024-03-05T20:18:44.314456Z  INFO tmkms::connection::tcp: KMS node ID: ...
2024-03-05T20:18:44.314862Z  INFO tmkms::session: [atlantic-2@tcp://...:51759] connected to validator successfully
2024-03-05T20:18:44.314869Z  WARN tmkms::session: [atlantic-2@tcp://...:51759]: unverified validator peer ID! (a47c7867b3191c93eed4bf0f01a9d4bc95a193ac)
2024-03-05T20:18:44.414468Z ERROR tmkms::client: [atlantic-2@tcp://...:51759] protocol error: malformed message packet: failed to decode Protobuf message: buffer underflow
tony-iqlusion commented 4 months ago

@qezz can you confirm that tendermint-p2p v0.34.1 was used in the build? (I can pin it in the next prerelease)

qezz commented 4 months ago

let me check

qezz commented 4 months ago

In the build log it says

...
   Downloaded tendermint-p2p v0.34.1
...
   Compiling tendermint-proto v0.34.1
   Compiling yubihsm v0.42.1
   Compiling tendermint v0.34.1
   Compiling cosmos-sdk-proto v0.20.0
   Compiling tendermint-p2p v0.34.1
   Compiling tendermint-config v0.34.1
tony-iqlusion commented 4 months ago

Thanks, I reopened this issue: https://github.com/informalsystems/tendermint-rs/issues/1392#issuecomment-1979592089

datanexus-vincent commented 3 weeks ago

We're running into the same issue very consistently with Initia's testnet when we turn on the oracle, which I believes adds a significant amount of data to the TMKMS requests.

2024-06-14T01:46:26.861895Z ERROR tmkms::client: [initiation-1@tcp://validator:26658] protocol error: malformed message packet: failed to decode Protobuf message: buffer underflow
2024-06-14T01:46:27.862227Z DEBUG tmkms::session: [initiation-1@tcp://validator:26658] connecting to validator...
2024-06-14T01:46:27.864663Z  INFO tmkms::session: [initiation-1@tcp://validator:26658] connected to validator successfully

Has anyone attempted to build with @zarkone's upstream PR?

mkaczanowski commented 3 weeks ago

@datanexus-vincent did you try to build on top of: https://github.com/iqlusioninc/tmkms/pull/903/files

I think that shall fix your issue (as it did fix it for SEI). Though we haven't checked the Initia yet.

@tony-iqlusion we'd appreciate your PR review :)

datanexus-vincent commented 2 weeks ago

@mkaczanowski I did once I realized it wasn't an upstream PR but a PR for this repo, and it worked! Thanks for the effort to get that working.