spacemeshos / go-spacemesh

Go Implementation of the Spacemesh protocol full node. 💾⏰💪
https://spacemesh.io
MIT License
760 stars 211 forks source link

compact encoding for hare preround message #5606

Open dshulyak opened 9 months ago

dshulyak commented 9 months ago

hare preround message includes every proposal that is eligible for rewards. it references it by specifying first 20 bytes of blake3 hash of the full proposal. with 50 proposals per layer the message is around 1KB, not including vrf and regular signature. with 500 proposals per layer it grows to 10KB per preround message. number of preround messages are bounded to the size of hare committee, right now we are running it with 400 participants, but aim to increase it to 800.

with 10KB per preround message every peer has to download 4MB of unique data within 12s, in practice it downloads some duplicates because of the redundancy on gossip layer. the other part of the problem is that it scales with number of connections that node maintains. if node has 1000 connections, it can expect to be requested 4GB of data within 12s. eventually, if not now, this will be unsustainable to keep such number of connections.

compact encoding protocol

the reason why we keep 20 bytes is to prevent collisions, as otherwise votes are ambiguous and cannot be counted properly. moreover adversary can create collisions intentionally if identifier will be kept short to 2 or 4 bytes.

the proposal is to introduce compact encoding that is not open to dos attack, maintain low rate of accidental collisions, and in case of collisions provide a safe fallback. it can reduce traffic either x5 or x10, depending on the option that we choose.

protocol for encoder:

short_hash is a cheap hashing function, such as siphash or halfsiphash https://github.com/veorq/SipHash/blob/master/halfsiphash.c. we may decide how much we want to truncate output.

protocol for decoder:

we need to handle two edge cases due to possible collisions:

dshulyak commented 9 months ago

i introduced label https://github.com/spacemeshos/go-spacemesh/labels/draft to signal that the discussion about this only ready to be implemented yet.

acud commented 5 months ago

@dshulyak I have several questions arising from reading the related materials:

dshulyak commented 5 months ago

i think new type []CompactID is superior, each opaque []byte type will have to be prefixed with compact integer, one more byte or 50% increase per id. one more option is to use bit packing and then do variable length encoding for proposals if we know that there are collisions.


regarding whole siphash thing, in original idea i wanted it to be keyed by vrf (truncated to whatever is the size of the key). however there is also this "extension" that needs to be implemented.

how about even simpler approach. preround message is encoded with 2/4 bytes, simply 2/4 first bytes of the proposal hash without any additional randomness. message is signed using full proposal hash, as discussed previously, so that signature verification is always executed on correct message.but in case of ambiguity, where we cannot deterministically decode the message for signature verification, we will do one more round trip and download message with full ids from the peer that sent message with small ids. so if adversary pulls out an attack, we potentially will have to download 10%-25% more data, and we also know how to disconnect a peer that intentionally sends ambiguous messages

and in this case there seems to be no difference between truncating original blake3 hash to 2 or 4 bytes or using siphash. as ambiguous proposals will always have to be requested from the peer that sent message with them.

with this extension the decoding protocol is like this:

one more thing is that hare gossip handler right is in synchronous mode, but it will have to be switched into a mode that spawns goroutine per request in gossip libp2p.


there was a question about breaking protocol changes and upgrade path, it should be as follows:

mathcrypto commented 4 months ago

After discussion, we arrived to the following conclusion:

Handling Collisions:

When collisions (same truncated ID for different proposals) are detected:

Verification Process: