Apply changes to the state asynchronously?

Erigara commented 3 months ago

How things work right now:

Leader gathers transactions for the block from the queue
Verify transactions signatures
Execute transactions and categories it as either valid or invalid
Include transactions + execution result in the block
Broadcast block to other peers in the network
Validator and ProxyTail receives block
Verify block header: height, view change index, merkle root hash, transaction signatures
Re-execute block transactions and check if they were categorized correctly
If valid send signature to ProxyTail
On enough sigantures block is committed by proxy tail and commit message is broadcasted to every node
Block is committed and written to kura block store
On committing block triggers are executed
New round begins

How things would work:

Leader gathers transactions for the block from the queue
Verify transactions signatures
Include transactions in the block (so deterministic order is established)
Broadcast block to other peers in the network
Validator and ProxyTail receives block
Verify block header: height, view change index, merkle root hash, transaction signatures
If valid send signature to ProxyTail
On enough sigantures block is committed by proxy tail and commit message is broadcasted to every node
Block is committed and written to kura block store
Job is created to try execute every transaction and process triggers
New round begins
(In separate thread) Job is processed, transactions and triggers are executed, state is updated

Since there is an assumption that iroha is deterministic my idea is postpone execution of transaction until block is committed.

So that sumeragi is no longer responsible for executing transactions but became more like transactions ordering service.

Benefits:

separate concerns ordering transactions and executing them
should help with problem of heavy blocks which put sumeragi into infinite loop because verification of signatures and block header take smaller amount of time that executing every transaction
iroha would be able to catch up on updates after pick load decreased

Caveats with this approach:

it looks like some parameters have to be updated synchronously to take effect on the next block. This parameters include sumeragi parameters (block/commit time, block size) and maybe some others which are required to verify transaction or block header
due to updates happen asynchronously client have to wait for event like block applied to see changes of the block (client have to wait for this event even now)

mversic commented 3 months ago

Amazing suggestion. Sounds like the way it should have been all along.

it looks like some parameters have to be updated synchronously to take effect on the next block. This parameters include sumeragi parameters (block/commit time, block size) and maybe some others which are required to verify transaction or block header

If this is an issue, we can introduce some sort of checkpoints and synchronize the network on them

due to updates happen asynchronously client have to wait for event like block applied to see changes of the block (client have to wait for this event even now)

if we have this now, why mention it?

Erigara commented 3 months ago

if we have this now, why mention it?

Delay might be higher and previously we had strict order of events like that:

Block 1 committed
Block 1 applied
Block 2 committed
Block 2 applied
Block 3 committed
Block 3 applied
Block 4 committed
Block 4 applied

But after the change order of event might be different:

Block 1 committed
Block 2 committed
Block 1 applied
Block 3 committed
Block 2 applied
Block 3 applied
Block 4 committed
Block 4 applied

mversic commented 3 months ago

Delay might be higher and previously we had strict order of events like that:

I don't think the delay will be higher if properly implemented. Sure there will be some reordering of events

Mingela commented 3 months ago

Sounds great, I think it'd require introducing some more event types (e.g. committed, finalized) so that a client would understand what they should be waiting for (validation or WSV affection), since many client interactions would be keen on sequential processing/requesting.

The other complication would be related to the error handling, i.e. when one happens during the state transition application. How would the peer realize whether it's a local or a global problem. How to handle the situation when a block cannot be applied but further blocks may depend on those changes etc.

Also it might increase either disk space or p2p communications (how much?) since the state of a block (validated/applied) should be tracked as well. What if a peer crashes when the block hasn't been applied yet etc etc.

Erigara commented 3 months ago

I think it'd require introducing some more event types

Yes, currently we already have block separate block statuses (committed and applied), we should do the same thing for transaction statuses (committed, applied/rejected).

mversic commented 3 months ago

I think it'd require introducing some more event types

Yes, currently we already have block separate block statuses (committed and applied), we should do the same thing for transaction statuses (committed, applied/rejected).

Transactions don't need it because it's implied in the BlockStatus::Applied event. The user can figure out which transaction belongs to which block by comparing block height of the 2 events

Erigara commented 3 months ago

Also it might increase either disk space or p2p communications (how much?) since the state of a block (validated/applied) should be tracked as well.

Not sure why this would increase p2p or disk since single source of truth is block store.

What if a peer crashes when the block hasn't been applied yet etc etc.

In this case block store would be used as write ahead log essentially, state would be loaded from snapshot at certain height and further changes would be derived from blocks in the block store.

How to handle the situation when a block cannot be applied but further blocks may depend on those changes etc.

Let's assume that there is transactions A and B where B depend on results of A. If client submitted B without waiting confirmation of A and A is rejected there is two options:

B would be rejected as well
B would be applied but result of this application is kinda undefined.

But due to iroha being deterministic (ideally) result would be the same on all correct peers, so this is rather business logic problem rather than consensus one.

If client wants to submit B on successful application of they should either wait for proper event and send only after or move this logic on chain (i.e. B checks that it's prerequisites are satisfied).

Erigara commented 3 months ago

Also it might increase either disk space or p2p communications (how much?) since the state of a block (validated/applied) should be tracked as well.

Not sure why this would increase p2p or disk since single source of truth is block store.

Actually i think that's true that we need track state of the block if we want to produce events like block applied, transaction applied only once.

Erigara commented 3 months ago

The other complication would be related to the error handling, i.e. when one happens during the state transition application. How would the peer realize whether it's a local or a global problem.

This one is tricky. In ideal world iroha should be deterministic so that correct nodes behave in the same way.

Let's consider what would happen with current iroha if there is some bug which manifest itself only on f peers (they complain that leader is incorrectly categorize transaction).

Eventually they would end up observing peers so that other peers will continue building chain but this peers would stop accepting new blocks.

If number is higher or equal to f + 1 than iroha only blocks that are consider valid by both sets would be committed.

If number of faults is higher or equal to 2f + 1 than this peers would take over the blockchain because they would commit block not invalid for other peers.

With the suggested approach nodes would diverge silently (to consensus protocol) in case of bug which leads to non deterministic behavior across nodes. So this might be dealbreaker.

Mingela commented 3 months ago

Let's assume that there is transactions A and B where B depend on results of A. If client submitted B without waiting confirmation of A and A is rejected there is two options:

B would be rejected as well

B would be applied but result of this application is kinda undefined.

I guess it should be rejected then (if A is a precondition for B, e.g. register an entity B references etc)

Mingela commented 3 months ago

With the suggested approach nodes would diverge silently (to consensus protocol) in case of bug which leads to non deterministic behavior across nodes. So this might be dealbreaker.

I guess such drastic scenarios are pretty possible now as well. I'd consider proper adjustments for the consensus for public mode, for private one it does not seem severe considering higher security and trust factors.

Erigara commented 3 months ago

I guess it should be rejected then (if A is a precondition for B, e.g. register an entity B references etc)

I mean that it would be rejected if precondition is not satisfied (like entity is not registered), but some cases maybe less obvious like send 10 roses (in B) after receiving 100 roses (in A), be might succeed if account have enough roses even if transfer of 100 roses failed.

Erigara commented 3 months ago

With the suggested approach nodes would diverge silently (to consensus protocol) in case of bug which leads to non deterministic behavior across nodes.

I think our current implementation is also prone to this kind of errors, but to lesser extent. Like if there is some non-deterministic function/data structure is used but it doesn't affect if transaction is rejected or accepted.

Also how we execute triggers right now is similar to the proposed change, we don't check that trigger execution result is the same across all peers.

s8sato commented 3 months ago

The points of contention appeared to me to be two:

Removing "prediction of execution result" from blocks. This is for an optimization by reducing executions experienced by a block from three to one
Relaxing the order of applying the last block then committing the next block. This is for an optimization by allowing the next round to, thanks to 1, proceed up to commit without waiting for the last block to be applied

I agree with 1. It should be fine as long as the state is deterministic when assuming no misbehaving peers. It would not be a new problem with this issue that faulty peers accumulate and the network becomes unhealthy. We already have event recommendations that only determines the order of trigger executions. It does not predict but does guarantee the success or failure of the executions, because being ordered means being deterministic. It should be enough to agree on such a guarantee.

Regarding 2, there seems to be questions:

How do we compose event recommendations of the next block without applying the last block?
How do we apply synchronously only those transactions that affect the consensus?
I'd suggest checking state.height() before spawning threads to apply blocks so that they are in order

Erigara commented 3 months ago

We already have event recommendations that only determines the order of trigger executions.

Afaik event recommendations rn are not used at all. We assume that due to iroha being deterministic and executing transactions in order each node would produce the same events and thus the same order of trigger execution is produced.

Erigara commented 3 months ago

I'd suggest checking state.height() before spawning threads to apply blocks so that they are in order

We should definitely apply blocks in order we can think of block on commit being queued to be processed and applied.

The intention is to separate processes of agreeing on transaction set and order and executing them.

s8sato commented 3 months ago

How do we apply synchronously only those transactions that affect the consensus?

Now my remaining concern is this, which is already included in your first description

Erigara commented 3 months ago

Now my remaining concern is this

Yes, in theory we could have partially async execution of blocks where some parameters are updated on commit, but we have to search the block to find such isi. Situation get even more complicated if we consider that parameter updated could be done in smart-contract or trigger.

mversic commented 3 months ago

Situation get even more complicated if we consider that parameter updated could be done in smart-contract or trigger.

this is impossible, unless we apply all triggers/smart-contracts synchronously. We can consider disallowing executing SetParameter from smart contracts

Erigara commented 3 months ago

Checked Tendermint consensus, by looking at it's interface to interact with blockchain (ABCI) it uses synchronous block and transaction execution. Moreover it's block header might include application state hash to insure that all honest nodes are on the same page.

Also i've noticed that they put consensus critical parameters separately in the block, this change might help us as well.

Erigara commented 2 months ago

Executing the transactions in the execution_payload updates the global state. All clients re-execute the transactions in the execution_payload to ensure the new state matches that in the new block state_root field. This is how clients can tell that a new block is valid and safe to add to their blockchain.

Etherium as well uses sync block execution (link).

Erigara commented 2 months ago

After some thought and discussion with @SamHSmith i think that it may be too risky to introduce this change.

Things i am worried about:

If at some point block validity going to depend on transaction execution (e.g. global block limit) this automatically stop to work.
Hard to spot issues with non determinism:
- right now it's pretty simple to modify block to put hash of the state in the header
More complex execution logic because some things should be applied immediately

mversic commented 2 months ago

I'm not sure this ticket is feasible. I think we should close it, but we can put it in the backlog for further consideration

hyperledger / iroha

Apply changes to the state asynchronously? #4733