leapdao / leap-node

LeapDAO validation node
https://docs.leapdao.org
Mozilla Public License 2.0
15 stars 11 forks source link

Get rid of CheckBridge and forked Tendermint #357

Closed troggy closed 4 years ago

troggy commented 4 years ago

Bounty

We are using tendermint fork with extra home-grown ABCI called CheckBridge. It is supposed to be called every 32 blocks and return 1, otherwise the proposal is deferred (consensus engine is stopped). On a leap-node side we use checkBridge handler to ensure there is a period submitted. This is the ultimate reason for all the clusterfuck — ensure that period is submitted.

~Three~ One problem with this approach:

  1. We need forked tendermint, which makes sync with upstream harder.
  2. ~It won't be working right because of the CAS — there is no period every 32nd block anymore with CAS — it is submitted later, once there are enough period votes.~ Solved in #358
  3. ~As of now it doesn't seem to work yet the networks are working.~ Solved in #358

This bounty attempts to address the problem purely on ABCI app side, so we can use vanilla tendermint.

Scope

Deliverables

Gain for the project

Publicly visible delivery

writeup for newsletter

Roles

bounty gardener: @troggy / 90 DAI bounty worker: name / share bounty reviewer: name / share

johannbarbie commented 4 years ago

how to prevent that tendermint runs too far ahead of period submission?

sunify commented 4 years ago

Why it is bad? (asking for a friend)

troggy commented 4 years ago

how to prevent that tendermint runs too far ahead of period submission?

I can imagine it is possible to have temporary gaps in the period list. We are not really chaining periods anyway — prevHash in Period object is not used anyhow. Or am I missing something?

johannbarbie commented 4 years ago

Why it is bad?

johannbarbie commented 4 years ago

prevHash in Period object is not used anyhow

pinkiebell commented 4 years ago

I think a threshold of max. 3 pending periods cause no problems?

As of now it doesn't seem to work yet the networks are working.

That's odd. 🤔 The last time I had the leap-node running, I had those checkBridge call in the debug logs.

troggy commented 4 years ago

ok, we keep prevHash :) doesn't really stops us from having some temporary gaps

troggy commented 4 years ago

also, @sunify and me discussed an option to "suspend" the app if there is no period landed after 32 blocks. Similar like we do with CheckBridge, but done on the app side: just need to stop proxying txs to Tendermint RPC.

troggy commented 4 years ago

unordered submission problems

@johannbarbie you mean tracking of nonce?

johannbarbie commented 4 years ago

unordered submission problems

@johannbarbie you mean tracking of nonce?

i was only referring to prevHash

pinkiebell commented 4 years ago

also, @sunify and me discussed an option to "suspend" the app if there is no period landed after > Similar like we do with CheckBridge, but done on the app side: just need to stop proxying txs to Tendermint RPC.

What if a malicious- or bad node sends to the tendermint daemon? 🙂

Edit: I think implementing a method to completely halt the consensus and/or mempool engine would work

troggy commented 4 years ago

to completely halt the consensus

that's what we doing with CheckBridge. @pinkiebell Do you know how to do that without changing Tendermint code?

pinkiebell commented 4 years ago

to completely halt the consensus

that's what we doing with CheckBridge. @pinkiebell Do you know how to do that without changing Tendermint code?

Sadly, no idea. you can check the ABCI endpoints but I'm sure something like that isn't there. Another idea without running a fork is to kill the tendermint daemon and restart once periods looking fine 😁

troggy commented 4 years ago

It won't be working right because of the CAS

maybe I've over-exaggerated — checkBridge checks not the current, but the previous submission which should happen at that point even with CAS.

troggy commented 4 years ago

unordered submission problems

I think this is solved, if all the validators keep a pendingPeriods list — proposer can just use the hash of the last period as a prevHash.

johannbarbie commented 4 years ago

overall gut feeling:

:)

troggy commented 4 years ago

can't construct proofs fast exit handler can become veeery slow

Both of these are relevant right now as well: if there is no period and checkBridge stops the consensus you can't construct proofs as well.

troggy commented 4 years ago

super scared that it will break in unforeseen ways

Me too. I also scared that the node is working in unforeseen way 😄

troggy commented 4 years ago

I see two quite separate concerns here in this issue:

The bounty definition as it is now addresses only the first concern assuming the second one is not needed and should be removed. While it is still not clear should we allow the chain to run once there are missed period submissions, I will split this bounty in two:

  1. More reliable period tracking (like described in this bounty). We can keep patched Tendermint and checkBridge feature.
  2. Replace checkBridge hack with something on the app side (this is to be defined) or remove the hack completely. Switch to upstream vanilla Tendermint.
troggy commented 4 years ago

closing in favour of https://github.com/leapdao/leap-node/issues/410