ledgerwatch / erigon

Ethereum implementation on the efficiency frontier
GNU Lesser General Public License v3.0
3.03k stars 1.05k forks source link

Erigon/Caplin sharing some invalid messages on some CL gossipsub topics #10844

Closed cortze closed 1 week ago

cortze commented 1 week ago

Description

Hey there folks! This issue is just to raise a concern about the interaction of the Erigon/Caplin client within the Ethereum CL p2p network.

At ProbeLab, we’ve been running a Prysm node for a while, and we’ve been able to track some occasional error log messages reporting the arrival of an invalid message on gossipsub topics such as voluntary_exit and attester_slashing.

Without any chance of knowing which node and which client broadcasted the message first, we see that erigon/caplin has been forwarding them without any type of validation, and sharing it over its gossipsub topic mesh connections mpacts the peerscore, which generally ends up triggering sudden PRUNEs from all the gossipsub meshes (we already experience a similar issue when running our tool Hermes and we documented it in this post.

This might not seem like a severe problem, but Caplin’s connectivity could be impacted by this, leaving it without any stable connections in the gossipsub meshes.

Do you guys have any idea of how could a light message verification be implemented at the Caplin client?

System information

Erigon version: erigon/caplin

Chain/Network: Mainnet

Expected behaviour

Having invalid messages around is not an expected scenario. However, we can see that on some occasions, they do happen. Ideally, the Erigon/Caplin software should filter them out, as sharing them back to its directly connected peers can jeopardize its mesh connectivity.

Actual behaviour

The measured errors look like this:

time="2024-05-16 12:35:01" level=debug msg="Gossip message was rejected" agent="erigon/caplin" error="non-active validator cannot exit" gossipScore=-6182.725625534806 multiaddress="/ip4/<hidden-just-in-case>/tcp/55742" peerID=16Uiu2HAkzNLy2S3voLw3CFxET1kXYSZVLV6QwkHuP3RaDdGJSk2E prefix=sync topic="/eth2/6a95a1a9/voluntary_exit/ssz_snappy"
time="2024-05-16 12:35:01" level=debug msg="Gossip message was rejected" agent="erigon/caplin" error="non-active validator cannot exit" gossipScore=-6182.725625534806 multiaddress="/ip4/<hidden-just-in-case>/tcp/55742" peerID=16Uiu2HAkzNLy2S3voLw3CFxET1kXYSZVLV6QwkHuP3RaDdGJSk2E prefix=sync topic="/eth2/6a95a1a9/voluntary_exit/ssz_snappy"

Steps to reproduce the behaviour

Unfortunately, to recreate the behaviour, we would have to propagate back an invalid message to the network, which could generate invalid traffic in the network.

Giulio2002 commented 1 week ago

This was fixed in the most recent version, most nodes are running the previous version (2.59) which had multiple gossip bugs over p2p, and is very bugged from a P2P perspective. I just expect the other nodes to ban us. I think maybe for future version we can add the version to it