Fork resolution failure when using truncated chain

madadam commented 3 years ago

Currently only the elders have the full chain (going all the way to the genesis key). Other nodes have only the last couple of keys (often only the very last key). This can cause problems on fork resolution where some nodes might end up permanently out of sync with the rest of the section. Consider this example:

The current chain is A->B->C
A new node joins and gets only the latest key C
A fork happens and is resolved, so the new chain becomes:
```
A->B->C
   |
   +->D
```
Where D is now the last key.
The elders send a Sync message to the new node which contains a truncated chain B->D (so the last key and its parent, which is usually enough to update the node)
The new node knows only C so it bounced the message back as untrusted
The elders have no way to extend the proof which would make it trusted by the new node so they discard the bounce
Thus the new node becomes permanently out if sync because it never progressed from the key C

NOTE: the reason why B->D cannot be extended to include C is that if we extended it to:

B->D
|
+->C

then it would be impossible to verify it if the node known only C as anyone can create a key that signs C and pretend it's the real chain:

X->Y
|
+->C

Idea for a fix: don't truncate chain

Probably the simplest way to fix this is to never truncate the chain and also always provide the full chain to new nodes via the NodeApproval message. Doing this it would always be possible to find a common ancestor of both the latest key and the key the new node known.

Alternative idea 1: fallback to the full chain

If we assume the genesis key is implicitly trusted by everyone, we can always fall back to sending the full chain in case the proof cannot be extended otherwise. We would just have to make sure everyone known the genesis key. So either hardcode it into the source code, pass it via the config or obtain it by some other means.

Alternative idea 2: bidirectional chain

Modify the chain so that every block contains also a signature of the parent key created with the child key. That way the chain is verifiable in both directions and so it's always possible to extend the proof.

dirvine commented 3 years ago

I think

The elders have no way to extend the proof which would make it trusted by the new node so they discard the bounce

Is not correct. Any Elder will have the chain from Genesis to their own "leaf" and can provide full history from Genesis to now. So regardless they can provide that when needed to cover any blip. I cannot see a way that a node who gets "C" above wont be fixed quickly with LM as they will have an invalid key and the Elders will have to supply full chain to them.

Unless I am missing a point (perhaps this is a discourse one)

madadam commented 3 years ago

Yes, providing a full chain is one of the solutions I mentioned (the alternative idea 1). The only issue with it is that we would have to make sure every node always knows the genesis key, otherwise they would have no way of knowing the provided full chain is the real deal.

dirvine commented 3 years ago

Absolutely, the Genesis key has to be known to all nodes, at least until we can do snapshots etc. but Initially we need to probably hard code the genesis key (for safety).

madadam commented 3 years ago

OK, that should be easy then. Maybe better than hardcoding it would be to pass it in via the config file? So we don't have to compile two separate binaries - one for the genesis node and one for the other nodes?

dirvine commented 3 years ago

Yes, I thought about config file. The worry would be that file being targetted by malicious folk. There is a chance though we write it out on the first startup (so hard coded again I suppose).

maidsafe / sn_routing