ledgerloops / strategy-pit

Testing ground for LedgerLoop strategies
Apache License 2.0
0 stars 0 forks source link

timestamping #27

Closed michielbdejong closed 2 months ago

michielbdejong commented 2 months ago

Another solution to timestamping of all-or-nothing lifts like in MyCHIPs might be if everybody timestamps everybody else's promise. So then you get something like a three-phase commit:

phase 1: promises phase 2: timestamping of the promises phase 3: commit

My gut feeling says this cannot be a solution to the two generals problem, although maybe we can use eventual execution to still make it work.

Suppose the messages between nodes are guaranteed to eventually arrive, if you just keep resending them often enough.

Then if everybody wants the lift to go ahead, eventually the nodes will all know that (even though none of the nodes will know when this state will have been reached).

And if everybody receives all signatures before the timeout, and they all send out a message about that, then eventually that message will also reach every node.

Also, if one node hasn't seen all promises before the deadline, then they can roll back and get on with their lives.

The biggest hassle would be if some nodes receive all promises before the deadline, phase 2 can take a long time.

So in phase 1 it's enough if nodes only send a "yes", leaving the "no" to timeout, although a "no" message can obviously be useful to speed up rejection before the deadline.

But in phase 2 it's important that each nodes sends either a "yes" or a "no". If you were to only send "yes" messages in phase 2, then if one node doesn't, the other nodes would not know how long to wait.

michielbdejong commented 2 months ago

This algorithm would help speed up successful cases, but for cases where one node gets disconnected, it could take a long time before that is detected. So you would need some sort of no-news-is-bad-news mechanism, with regular ping-pong messages maybe...

michielbdejong commented 2 months ago

Something that could work is if all nodes broadcast / gossip their intent to participate, and then keep track of network failures. If any link in the network fails before completion was communicated over it in both directions, call off the deal.

michielbdejong commented 2 months ago

Achieving common knowledge in the face of arbitrary communication failiures is impossible, but assuming communication failures are always temporary, we could have a system where n nodes

michielbdejong commented 2 months ago

I think this protocol would avoid the Byzantine Generals problem of never reaching common knowledge by letting communication failure influence the result of the decision. Communication failure counts as agreement failure. Both sides can detect a network split because pings and pongs no longer arrive in time. When this happens, everybody knows the agreement is off.

I think this will work both for all-to-all and for ring communication.

I think all-or-nothing commit is, for many situations, preferable over staggered timeouts with connector risk.

I think I'll therefore make it the default Loop Resolution mechanism for LedgerLoops.

I should write something about this on https://ledgerloops.com/

michielbdejong commented 2 months ago

I though about this some more. The trouble with collaboratively detecting communication breakdown is that a malicious or malfunctioning node could forward the ping messages but drop the payload messages. So to mitigate that, message forwarding should use encryption. The version I came up with now does reveal the loop length, maybe that can be fixed in a next version. It would work as follows:

michielbdejong commented 2 months ago

Two possible remedies:

michielbdejong commented 2 months ago

Single-Direction Lift Execution

The contracts keeps being sent along the ring, with signatures or vetoes being added. As soon as there is either 1 veto or all signatures, send back a signed ACK When you received a signed ACK from each participating node, stop sending.

Still, this relies on multi-hop messaging, which means one node's money can be tied up by another node's failure to forward messages.

And so you would need a timeout And so you would get stuck in Byzantine Generals

michielbdejong commented 2 months ago

mesh comms

A simpler situation is what MyCHIPs are working on, which is one-hop communication between all nodes. Then each node can just broadcast their hop signature to each other node until they get an ACK, and punish a node who contradicts themselves. In case of communication failure, the majority decides. But then if the group is split into 3 small groups, none of them will reach a majority. Hm, how would you solve that?

See also https://github.com/gotchoices/ChipNet/issues/5

michielbdejong commented 2 months ago

OK, so here's another idea (I think): What if everybody keeps sending short-lived promises to everybody, and then maybe gradually increases their validity as the outcome becomes clear. I think this would be similar to how rounds of blockchain confirmation work.

michielbdejong commented 2 months ago

Hm, I think a perfect solution doesn't exist. There are always uncertainties and failure modes, and all-or-nothing finality in the sense of common knowledge is probably fundamentally unachievable.

Another compromise might be to say that nodes that get separated from the initiator will have their money tied up until that communication is re-established.

In that sense it's all not really different from the please-revert messages I came up with in 2016.

If a link is just after a link that is down, it can send out please-revert messages, and if it's just before one it can step up and volunteer to take the hit, risking that the other side of the network split did a final commit.

Then, at least, the risk stays local.

This thought process does highlight that ring communication is brittle and easy to attack. Mesh communication is better, and it really only reveals the number of participants in a cycle, which is not such a big deal to reveal probably.

So I do see the point of MyCHIPs opting for mesh communication, and then just majority commit. The communication network damage needs to be pretty severe before it then stops a lift from getting to finality.

michielbdejong commented 2 months ago

So I think communication failure before one node sees a fully signed contract can still be dealt with (just roll back on both sides of the network split) but if one side already reached finality, and the network splits preventing common knowledge, there is no good way to deal with that. So the best remedy, to get all-or-nothing finality, is to have mesh communication instead of ring communication. I'll explore this in ledgerloops/ledgerloops#36