sync: handle cross-chain data efficiently

countvonzero commented 1 year ago

Description

problem

capturing relevant slack discussion: @dshulyak

the goal behind this genesis id prefix was to prevent downloading whole chain of objects, 
and instead invalidate object that was signed for different network immediately
but it doesn't work this way :slightly_smiling_face: whole chain of objects is still downloaded. 
it recovers wrong public key, and will fail when compare initial atx (because golden atx will be different). 
so this whole scheme doesn't serve its purpose

pointed out that the goal of genesis id prefix in signature was to prevent downloading whole chain of objects, and instead invalidate object that was signed for different network immediately. however, in the current sync flow, the whole chain of objects is still downloaded.

@tal-m 's response:

the main purpose of the genesis id is preventing cross-chain attacks.  
Optimizing for sync efficiency wasn't actually one of our original goals. 
I think it might be better to handle that at the gossip layer.

An honest node shouldn't gossip messages it hasn't verified. 
So in your example, the "bad chain" download could only be caused by a malicious peer 
(an honest peer would have verified the message first and reached the failing golden ATX).
If we recover a new public key (for which we have no verified ATX), I think the order of sync 
requests should be to ask for the original ATX first, rather than download recursively backwards. 
This would prevent the kind of attack you describe, even with a malicious peer. 
It can also be a lot more efficient, since we won't need  many rounds of requests --- 
if the peer knows that all ATXs for a given pk must be synced, it can send them in the proper order.

mitigation

from @tal-m

When you receive an ATX from which you extract an unknown public key, instead of asking for the 
previous ATX (and recursively working backwards), ask immediately for the first ATX (which an honest 
peer must have, if they validated the ATX they are sending you).

In general, I think that wherever it's possible it's better to sync forwards rather than backwards --- 
i.e., find the first point at which you're missing data from your peer, and ask them to start syncing from there.

from @noamnelke

I generally agree that syncing forward is easier than backwards, except when this isn’t implemented 
and needs to be spec’ed :wink:
Even if crafting two ATXs that resolve to the same public key on two chains is possible, it’s not as easy 
as retransmitting existing ATXs and the only damage it can do is force people who are directly connected
to you and syncing to have to fetch more messages before discovering that you lied. 
So I don’t think it’s a problem worth fixing right now.

We might need to ask for the smesher’s first ATX anyway (to get their POPS-VRF index) but I don’t see 
how it makes any difference for this attack - the adversary can just as easily make one of the
colliding ATXs the first.

from @noamnelke regarding how to detect a cross-chain data with 2 ATXs

msg = scale(atx)
signature = sign(privkey, genesis_id || msg)
pubkey = extract(genesis_id2 || msg, signature)

msg2 = scale(atx2)
signature2 = sign(privkey, genesis_id || msg2)
pubkey2 = extract(genesis_id2 || msg2, signature2)

implementation plan

current behavior

ATX: we won't find out the ATX is invalid until we trace all the way to the golden atx
ballot: only check if its ATX is eligible in the later part of the flow
sync fails when it fails to fetch ATXs, or timed out on fetching ballots/blocks

desired behavior

gossip
- ATX: when extract a NodeID/public key that we don't know about, ask peers for the first ATX of the NodeID and compare extracted public keys before syncing its referenced ATX recursively.
- ballot: move up the ATX NodeID check before fetching data
sync: do not fail sync when it failed fetching data (ATXs/ballots/blocks)

dshulyak commented 1 year ago

with canonical ed25519 it will be handled super efficiently. unlike with key extraction - signature will fail to verify if message was signed with a different prefix.

dshulyak commented 1 year ago

switched to ed25519, any cross chain data is invalided immediately

spacemeshos / go-spacemesh