omgnetwork / elixir-omg

OMG-Network repository of Watcher and Watcher Info
https://omg.network
Apache License 2.0
213 stars 59 forks source link

Race condition in `OMG.State` in when persistng exits from mempooled-outputs #613

Open pdobacz opened 5 years ago

pdobacz commented 5 years ago

It is unlikely to be exploitable or even affect us in real-world operation of the child chain and watcher but is an issue intrinsic to OMG.State making its behaviors surprising or unreliable, if analyzed in isolation. And since this is our workhorse module, it'd be good to keep it crystal clear, instead of relying on usage patterns that we currently employ.

For a synopsis of the issue see this branch with red tests exciting the incorrect behavior: 613-state_exit_utxos_race

Why this won't affect us? The argument depends whether we're in the Child chain server or Watcher:

1/ Child chain server - the behavior is related to exiting (State.exit_utxos) an output of a transaction in the child chain's mempool. Such exit is impossible in practice, since the transaction's inclusion cannot be proven in the contract. 2/ Watcher - the behavior is related to recognizing exit finalizations (again State.exit_utxos), which by are always processed after the respective block had been validated and processed. This time it is the RootChainCoordinator and it's syncing rules that have our back.

The problem lies around the rather rudimentary way of figuring out the db_updates around here. Instead, we should be tracking the UTXO-set diff more explicitly, alongside pending_txs, so that exit_utxos can throw in its 2 cents and prohibit the "popping back" of exited utxos.

InoMurko commented 4 years ago

is this still relevant @pdobacz if not, pls close?

pdobacz commented 4 years ago

It might've been resolved by the lazy-loading of UTXOs.

Unfortunately 613-state_exit_utxos_race is no longer out there (I shouldn't have been lazy to push a synopsis to a branch and not describe it here :man_facepalming: ).

IIRC, the issue was this:

As mentioned above, this would never happen, but from PoV of OMG.State module behavior, should have been fixed.

Now, the above issue existed when UTXO set was loaded eagerly. Lazy loading brought in some changes. Maybe let me loop in @pnowosie to look into this, you have more fluency in the lazy-utxo behavior than me? How will things behave?

All this discussion aside, I'm not sure whether the only improvement possible here would not be making the OMG.State contract stricter ("only ever exit_utxos that you know existed") combined with crashing if this is not respected, assuming it's unrecoverable error and bug.

InoMurko commented 4 years ago

Sounds good! @pdobacz @pnowosie asses the ticket and let us know if it's a mainnet blocker.

pdobacz commented 4 years ago

We'll asses and re

if it's a mainnet blocker.

see original post. Unless anyone can think of a way this can happen - the problematic condition can't occur with current applications of OMG.State. All of its clients respect the strict contract mentioned, despite this contract not being imposed and explicit.

That is, I don't think this is a blocker.