libp2p / go-libp2p-pubsub

The PubSub implementation for go-libp2p
https://github.com/libp2p/specs/tree/master/pubsub
Other
321 stars 185 forks source link

Signing policy + optional Signature, From and Seqno #359

Closed protolambda closed 4 years ago

protolambda commented 4 years ago

This PR proposes new non-breaking PubSub options, to force stricter validation (avoid hypothethical network split), and avoid privacy problems in Eth2.

Why

Privacy

The current gossip message ID is purely based on a hash of the contents, but it is still wrapped in a protobuf that carries From, Seqno and Signature. The From and Seqno affect privacy: we don't need, or want, the original source of the message to be known. Currently, I believe that if messages are not re-published, but propagated, that at least in the Go implementation these details remain in the gossip message.

While From is problematic (and previously known to be, just not fixed by anyone), Seqno alone is also problematic, since (in Go at least) it is initialized as nanosecond time of the node, and then only increments by 1. Because of the slow non-random increase on top of a big number, it's effectively a unique identifier of the origin, embedded in every message. This could be used to quickly correlate messages, and narrow down which validators (based on message contents) run on which nodes.

Network split

The "Signature" is not really used, and empty. However, the Go implementation seems to validate it anyway, if it is non-empty. Now other gossip implementations don't use it at all, or have a stalled PR open that implements similar behavior. In our case, the signature is dangerous, because it can make different nodes mislike eachother:

Changes

Loosely based on discussion with @raulk:

Any feedback welcome, I can make changes, or change the approach.

protolambda commented 4 years ago

Some concerns:

vyzo commented 4 years ago

Hrm, the test passes on travis and wfm; maybe there is some non-determinism.

vyzo commented 4 years ago

cc @raulk

protolambda commented 4 years ago

The test that fails locally:

=== RUN   TestGossipsubDirectPeers
    TestGossipsubDirectPeers: gossipsub_test.go:1139: expected a connection between direct peers
--- FAIL: TestGossipsubDirectPeers (2.01s)

gossipsub_test.go:1139 and context:

    connect(t, h[0], h[1])
    connect(t, h[0], h[2])

    // verify that the direct peers connected
    time.Sleep(2 * time.Second)
    if len(h[1].Network().ConnsToPeer(h[2].ID())) == 0 {
        t.Fatal("expected a connection between direct peers")
    }

Looks like it's a timing thing that misses, and unrelated to this PR.

Edit: increasing the two sleep statements before expectations to 10s worked. Flaky test.

protolambda commented 4 years ago

I think this is where things go wrong with the flaky test: https://github.com/libp2p/go-libp2p-pubsub/blob/aabbdb1143e1f75c7fd897ff93de2c37114502f1/gossipsub.go#L1529 New go routines are started to make connections, and the connections are not awaited (no waitgroup). At the same time, maybe that is desirable, to not halt the heartbeat loop. Waiting for it in a test is not ideal though. And I wonder what happens next heartbeat, does it just repeatedly try to connect? Is that what the ticking is for?\

Edit: if 1 tick is 1 heartbeat is 1 second, then 2 ticks to try 2nd connect attempt will be just enough or not, depending on go routine order. And the first attempt gs.heartbeatTicks%gs.directConnectTicks == 0 with heartbeatTicks = 0 may be missing for other reasons, requiring the 2nd to pass for the test to pass.

vyzo commented 4 years ago

Yeah, we can't block the event loop. It retries every few ticks, with an initial spawn.

vyzo commented 4 years ago

So, regardless of the very interesting issues you raise, codewise this is ready to be merged.

vyzo commented 4 years ago

I am going to merge it but not yet tag a release.

jrhea commented 4 years ago

Since those still send the "From" and "Seqno" fields. @jrhea is logging that data of different Eth2 clients (4 different gossipsub implementations, 5 if you count lodestar) on Altona testnet. I am curious what the current observed behavior tells us.

With respect to the seqno...not only can it be used to fingerprint nodes, but the fact that it is incremented with each new gossip message authored allows attackers to approximate how many validators a node is running (in the case of eth2).