probe-lab / hermes

A Gossipsub listener and tracer.
Other
29 stars 5 forks source link

Trace gossipsub scores #10

Closed cortze closed 7 months ago

cortze commented 8 months ago

Decription

The current tracing module of the Libp2p host and the GossipSub events includes most of the interactions between Hermes and remote peers in the Ethereum network. There is still a relevant point missing to debug, with higher resolution, how GossipSub events (mostly PRUNES/DISCONNECTIONS) relate to the PeerScore of each peer per topic.

Thus, this PR aims to aggregate the periodic export of the PeerScores snapshots. This feature was already present in the Lotus + TraceCatcher combo but hasn't landed on Hermes yet.

Tasks

cortze commented 8 months ago

Interestingly, the tests happily pass successfully when I try them out locally:

(trace-gossipsub-scores)$ go test ./tele
PASS
ok      github.com/probe-lab/hermes/tele        0.004s

I haven't even changed the existing tests or functions tested. I'll further look into it.

guillaumemichel commented 8 months ago

It seems that the tests are failing due to the 32-bit architecture, which is weird, I restarted the jobs, let's see if they pass.

If the problem persists, we can get rid of 32-bit tests.

cortze commented 7 months ago

Update

There have been some minor changes to the structure of the tool to make the PeerScoring work:

Previously, the tool would hide some ForkVersion and ForkDigest errors when it was mismatching with the trusted node one as it would update its BeaconStatus to the one of the node. This was making the tool to subscribe to the gossip topics of the wrong ForkDigest .

The tool reported some Decoding Errors when reading AggregateAndProofs from the beacon_block topic , this is now fixed.

The combination of all these three points was a bit tedious to troubleshoot, as a non-correct fork was leading to not having peers in the mesh when testing it locally with a Holesky node, keeping 0 connections open on the advertised network. The wrong decoding of the gossipsub topic was making the service to restart, thus not keeping any score, which was neither configured at the topic level.

There are significant changes in the code (apologies for that, I wasn't expecting such extra changes), but this last commit already connects correctly to the local Prysm node in holesky, and reports some initial peer scores at the beacon_block topic:

peer_id: 16Uiu2HAmGkRcFqRnh4pN66UTeoRBp7uMsfp385Cx4oxAUwN2qbNC map[AppSpecificScore:0 BehaviourPenalty:0 IPColocationFactor:0 PeerID:16Uiu2HAmGkRcFqRnh4pN66UTeoRBp7uMsfp385Cx4oxAUwN2qbNC Score:2.503214489295642 Topics:[<nil> map[FirstMessag
eDeliveries:2.929018111619552 InvalidMessageDeliveries:0 MeshMessageDeliveries:4.671362011321691 TimeInMesh:1m18.999537812s Topic:/eth2/69ae0e99/beacon_block/ssz_snappy]]]
peer_id: 16Uiu2HAm3HNpFPrpq4FshxjS44YFXp8948hyqZBxQNN6FfecEQ3J map[AppSpecificScore:0 BehaviourPenalty:0 IPColocationFactor:0 PeerID:16Uiu2HAm3HNpFPrpq4FshxjS44YFXp8948hyqZBxQNN6FfecEQ3J Score:0.13333333333333333 Topics:[<nil> map[FirstMess
ageDeliveries:0 InvalidMessageDeliveries:0 MeshMessageDeliveries:0 TimeInMesh:1m5.699377854s Topic:/eth2/69ae0e99/beacon_block/ssz_snappy]]]

Let me know if there is any change we would like to apply to the code itself, formatting or style (I haven't been able to fix the auto-fmt of vs-code), although I can manually leave those lines away :)