Closed cortze closed 7 months ago
Interestingly, the tests happily pass successfully when I try them out locally:
(trace-gossipsub-scores)$ go test ./tele
PASS
ok github.com/probe-lab/hermes/tele 0.004s
I haven't even changed the existing tests or functions tested. I'll further look into it.
It seems that the tests are failing due to the 32-bit architecture, which is weird, I restarted the jobs, let's see if they pass.
If the problem persists, we can get rid of 32-bit tests.
There have been some minor changes to the structure of the tool to make the PeerScoring
work:
ForkDigest
and the ForkVersion
for the given network and for the current epoch -> https://github.com/probe-lab/hermes/pull/10/commits/8f7e039739064b0625cc8ba38b5de1403bbd1735 . Previously, the tool would hide some ForkVersion
and ForkDigest
errors when it was mismatching with the trusted node one as it would update its BeaconStatus
to the one of the node. This was making the tool to subscribe to the gossip topics of the wrong ForkDigest
.
ForkVersion
at the PubSub
level allows to correctly parse the messages according to the current Fork -> https://github.com/probe-lab/hermes/pull/10/commits/a1bf61b60e15890ae64249d22dcdf2eb6e7aa4f9The tool reported some Decoding Errors
when reading AggregateAndProofs
from the beacon_block
topic , this is now fixed.
The combination of all these three points was a bit tedious to troubleshoot, as a non-correct fork was leading to not having peers in the mesh when testing it locally with a Holesky node, keeping 0 connections open on the advertised network. The wrong decoding of the gossipsub topic was making the service to restart, thus not keeping any score, which was neither configured at the topic level.
There are significant changes in the code (apologies for that, I wasn't expecting such extra changes), but this last commit already connects correctly to the local Prysm node in holesky
, and reports some initial peer scores at the beacon_block
topic:
peer_id: 16Uiu2HAmGkRcFqRnh4pN66UTeoRBp7uMsfp385Cx4oxAUwN2qbNC map[AppSpecificScore:0 BehaviourPenalty:0 IPColocationFactor:0 PeerID:16Uiu2HAmGkRcFqRnh4pN66UTeoRBp7uMsfp385Cx4oxAUwN2qbNC Score:2.503214489295642 Topics:[<nil> map[FirstMessag
eDeliveries:2.929018111619552 InvalidMessageDeliveries:0 MeshMessageDeliveries:4.671362011321691 TimeInMesh:1m18.999537812s Topic:/eth2/69ae0e99/beacon_block/ssz_snappy]]]
peer_id: 16Uiu2HAm3HNpFPrpq4FshxjS44YFXp8948hyqZBxQNN6FfecEQ3J map[AppSpecificScore:0 BehaviourPenalty:0 IPColocationFactor:0 PeerID:16Uiu2HAm3HNpFPrpq4FshxjS44YFXp8948hyqZBxQNN6FfecEQ3J Score:0.13333333333333333 Topics:[<nil> map[FirstMess
ageDeliveries:0 InvalidMessageDeliveries:0 MeshMessageDeliveries:0 TimeInMesh:1m5.699377854s Topic:/eth2/69ae0e99/beacon_block/ssz_snappy]]]
Let me know if there is any change we would like to apply to the code itself, formatting or style (I haven't been able to fix the auto-fmt of vs-code), although I can manually leave those lines away :)
Decription
The current tracing module of the Libp2p host and the GossipSub events includes most of the interactions between
Hermes
and remote peers in the Ethereum network. There is still a relevant point missing to debug, with higher resolution, how GossipSub events (mostlyPRUNES
/DISCONNECTIONS
) relate to thePeerScore
of each peer per topic.Thus, this PR aims to aggregate the periodic export of the
PeerScores
snapshots. This feature was already present in the Lotus + TraceCatcher combo but hasn't landed on Hermes yet.Tasks
PeerScores
(per peer and per topic), and flush them as tracespeerscore
traces are getting correctly flushed and tracked at the AWS kinesis instance