rchain / rchip-proposals

Where RChain improvement proposals can be submitted
Apache License 2.0
8 stars 5 forks source link

RCHIP-02: Block merging #23

Open tgrospic opened 4 years ago

tgrospic commented 4 years ago

This proposal is written from the instructions and conversations with L.G. Meredith and Mike Stay.

Introduction

When Rholang executes, it's changing the state of the tuple space. These changes are written as event logs in the block.

In order to run multiple blocks in parallel and merge the state, we need a way to detect conflicting changes, apply only non-conflicting changes and mark the ones that are denied (discarded).

From Casper perspective this means using multiple parent blocks for justification. The current solution is inefficient because it's running replay for merging deploys multiple times.
When base parent (candidate for finalization) is selected, deploys from all other parents are played/replayed again on top of the base parent.

This proposal is to improve the process of block merging by detecting conflicting changes in parent blocks and instead of replaying deploys to just apply changes from event logs.
This doesn't mean we don't need to replay the block. We must replay the block to check that event logs are correct but this needs to be done only once.

Design

In order to detect conflicts between blocks, we have to find two descendants of a finalized block that used a common name in an incompatible way. This involves walking up the dag of blocks and looking at the event logs. The proposal is to keep in memory for every block the conflict set. Conflict set include all the events of the current block and all the events in the block's ancestors, back to the latest finalized blocks.

When proposing a block that's a descendant of multiple parents, we check to see if any element of one parent's conflict set conflicts with an element of any other parent's conflict set using MergeabilityRules.scala (rules in spreadsheet). If so, we have to figure out whether they can both be run or if we have to choose just one of the deploys.

The elements of the conflict set have references to the blocks in which the channels were used. As blocks get finalized, we compute new conflict sets from old ones by taking the union of the parent conflict sets minus the uses in the newly finalized blocks.

In a situation when conflicts are detected we pick a winner deploy. That means to track two more states for a deploy, conflicted (when there are multiple options and it's not yet finalized which one wins) and denied when a different conflicting deploy won. Deploy already have these states: waiting to be added to a block, in a block that has been sent, got some blocks in response to that block, finalized. Block data must be extended to include status of denied deploys. [?] Other deploy statuses can be resolved locally and do not have to be explicitly written in a block.

Changes in denied deploys will not be applied to the finalized state as though the deploy never happened. Deployer should not be charged for this deploy nor validator will get rewards.

Examples

         LFB       
        / | \      
       /  |  \     
      /   |   \    
     /    |    \   
    B1    B2    B3 
     \   /      /  
      \ /      /   
      B4      /    
        \    /     
         \  /      
          B5       

If we have this situation and we want to create block B5. Conflict set for B4 will already be calculated and will contain event logs from B1 and B2. So to calculate conflict set for B5 we need union of B3 and B4. We do this when new block is created or replayed.   The second case is when the LFB (last finalized block) make progress, in which case we remove event logs from conflict set for the LFB and all its ancestors who were in the conflict set.  

Related activities / obstacles

Running multiple deploys in parallel should be Rholang superpower but currently it's limited because of RuntimeManager which has global lock and singleton instance of DebruijnInterpreter that has global cost state.

To address these issues RuntimeManager should be refactored.
https://rchain.atlassian.net/browse/RCHAIN-4025

On the RSpace level it should be investigated if there is something preventing parallel execution. HotStore must be isolated for each execution instance.

"In terms of Git operations"

It can be helpful to think about block merge similar to Git operations we are already familiar with. In the next table are assumptions or relations of RChain terms to Git terms.

RChain Git
Block Pull Request
Deploy Commit
Event log (COMMs) Changed text (lines)
Block merge Fast-forward merge / rebase
Finalized blocks Master branch
Block finalization Merging to master branch

Each block is like a pull-request trying to be merged to master branch. Master branch represents unchangeable history or a chain of finalized blocks. Blocks that have been selected with most of the stake decided by Casper consensus.

When multiple PRs are competing to be merged to master branch and they don't have conflicting commits, they can be applied to master branch in any order in the same way fast-forward merge is done with Git.

So complication is when merging blocks contain event log in which channels are used in a conflicting way. In Git terms that means commits from two different PRs are changing the same line. How to resolve this conflict is the main point of block merging. With Git we need to provide conflict resolution which commit will win or some mixture of changes.

In block merge, when conflicting deploys are detected, one deploy will be selected as a winner and the rest will be marked as denied. Keeping track of changes (event logs) and which deploys are denied is what constitutes conflict set.

Information about denied deploys is not necessary to be stored in the merge block but is can be useful when browsing history to have this information without recreating conflict set to find out the winner.
This is not special situation when it would be useful to persist data in the block, protected by the signature. E.g. invalid blocks are also stored in dag storage to optimize calculation each time this info is necessary.
It should also be investigated if signing only block metadata (deploys signatures w/o deploys data and event logs) could give us additional benefits like quicker download and validation of block dag when node is catching up.

References

Mergeability rules spreadsheet
https://docs.google.com/spreadsheets/d/1pABqArF9e8HRTO9zSefp93mIVUm91avekeDgqSEw0R8

Epic ticket
https://rchain.atlassian.net/browse/RCHAIN-1517

Block merging - trie merge
https://rchain.atlassian.net/wiki/spaces/CORE/pages/739278849/Block+merging+-+trie+merge

Block merging - conflict resolution - peek
https://rchain.atlassian.net/wiki/spaces/CORE/pages/739803157/Block+merging+-+conflict+resolution+-+peek

sjalq commented 4 years ago

Would it be possible to include in the block, the state elements retrieved and the state elements updated by the transactions in a block?

This way only the header of blocks needs to be checked and any header that retrieves or updates state that was updated by another block need not be evaluated to reject it or find conflicts. When merging parallel blocks the updated and retrieved state lists can be merged. Occasionally a block can be assumed to merge all underlying blocks and it can be market as such, making it a block that cannot have any parallel blocks. On top of there blocks parallel blocks can then again be built until it becomes more efficient to rather merge them and restart the process.

This has the added effect that, when it becomes viable, zero knowledge proofs for each block can be constructed such that full nodes would not need to re-evaluate all past blocks.

tgrospic commented 4 years ago

I've updated description with similarities of merging with Git. I hope this will be useful to clarify the whole process.

Would it be possible to include in the block, the state elements retrieved and the state elements updated by the transactions in a block?

What we already save in the block is Event log which stores all COMM events (state changes) created by the deploys in the block. Deploy can be seen as a transaction for which we are detecting conflicts and it makes this minimal unit that can be denied (marked non-mergable).

With Last finalized state feature we will be able to download finalized state and start executing blocks on top of that block. But this doesn't mean we don't need to replay the block. Only having changes in the event logs are not enough to reconstruct the "next" state. The reason is that event logs only contain the hashes of the state (like a memory reference) and not the state itself. But with Last finalized state we will be able to download state (cold store) so "replay" of finalized blocks can be optimized to take changes and download cold store part, and just apply it without execution. Validators still need to replay the block.

So another question unrelated to block merging is block replay (as a validator). Is it necessary to replay exactly the same code or just verify and apply changes? This is still in research and introduction of behavioral types should make this possible. But not with the same event logs we have today.

Zero knowledge proofs are mostly considered because they offer privacy with respect to calculated data. This is not needed here so maybe it will just introduce unnecessary complexity. Greg was talking about this in Casper sessions (sorry I couldn't find it now) how additional computation to verify the changes should not overweight normal Rholang execution.

dckc commented 4 years ago

@tgrospic writes:

I've updated description with similarities of merging with Git. I hope this will be useful to clarify the whole process.

You updated the description after it was approved? That seems awkward. Is there an amendment process? Or can you at least cite support from those who made the approval decision for the update?

Also: the Mergeability rules spreadsheet is a "normative" reference, right? A change to the spreadsheet would have an impact visible from tests and from blockchain operation, right? Would you please attach a copy as of when the decision was made? We don't want the spreadsheet changing out from under this decision. Ah... perhaps the b0e49ab3e hash in the link to MergeabilityRules.scala suffices.

In the future, I suggest you clarify materials included-by-reference in the proposal from informative links.

tgrospic commented 4 years ago

Thanks @dckc for the review!

Work on block merge started long time a ago so I put label Approved because decision to work on this feature had already been done.

Instead of just updating existing ticket on Jire I decided to create RCHIP to put all information that I currently have. We still didn't decided how to organize files or the format of files in this repo to store RCHIP items. This can be another issue to discuss when we all have better understanding what functionalities can be used on GitHub.

With the Mergeability rules, as always the devil is in the details. This code is used only in tests and two tests which check all rules are ignored because they take to much memory and time. Currently I don't know much more than that. https://github.com/rchain/rchain/blob/b137bfb90/casper/src/test/scala/coop/rchain/casper/batch1/MultiParentCasperMergeSpec.scala#L94-L100

When I'll have more info I'll update references and explain it more in details.

SteveHenley commented 4 years ago

@dckc Please refer to the Log: Tech Governance, date 2020-06-25, agenda item #4 regarding the approval of RCHIP-02: Block Merging and all subsequence RCHIPs and RCHIP issues.

zsluedem commented 3 years ago

detect conflicts

In order to solve the conflicts in high performance, we need to detect conflicts when blocks are added.

Data Structure for conflicts deploy for each branch(below is the cases with 3 branches, not really related with the dag example below) branch branch conflict deploys
branchA branchB conflictDeployList
branchC branchB conflictDeployList
branchA branchC conflictDeployList

merge two state

For merging two conflict state: Let's take the smallest case

         ?(k,l)
    /            \
B2(x,y)     B3(m,n)
   \             /
          B1

Supposed that

  1. B2 contains x,y two deploys
  2. B3 contains m,n two deploys
  3. y conflicts with n, and y is the winner

The question is how we can merge B2 and B3.

We need to revert history changes in B2's n. and merge B2 state with B3 state.

We need to store additional information for each deploys.

In order to revert changes for a conflict deploy, we might need to store HotStoreAction

Since the node already got the EventLog of m in B3, the channelsHash should be the path in the Trie. Then maybe we can apply these path hashes to the trie if the conflicts detection can exclude the data(or continuation) on the same channel. Otherwise, we need HotStoreAction

Additional questions:

  1. Should we include info about reject deploy in the merging blocks? In standup, @tgrospic and @leithaus agreed that We need to change the block message structure which can shows the inject deploy info
tgrospic commented 3 years ago

@zsluedem I updated 4th paragraph in Design section accordingly.

tgrospic commented 3 years ago
  1. B2 contains x,y two deploys
  2. B3 contains m,n two deploys
  3. y conflicts with n, and y is the winner

The question is how we can merge B2 and B3.

First we choose main parent between B2 and B3. Let's say B2 is the main parent, this means x and y are already included in the state and we need to merge B3. n is denied as conflicting and m must be applied to the state. We have EventLog for m inside the block but also we have stored data (cold store) when B3 is replayed. This should be enough to just call HistoryRepository.process with the data from EventLog and data (leafs) will be stored in replay with transformAndStore.

zsluedem commented 3 years ago

I started to think that EventLog could not play an important part in merging two states. EventLog: represents the intermediate hashes happened in rholang execution State: represents whatever(data, continuation and etc.) left in the tuple space

Logically, I don't think EventLog can get what the state want(items left in tuplespace).

zsluedem commented 3 years ago

After some investigations, I think currently rnode got the mergibilityRules for Rholang ONLY. But rnode also needs the MergeableRules for Rspace. That's what rnode are missing right now.(Or we can say that replaying the merging block on the main parent is our MergeableRules for Rspacenow which is very low speed when merging two long chain)

zsluedem commented 3 years ago

I started to think that EventLog could not play an important part in merging two states. EventLog: represents the intermediate hashes happened in rholang execution State: represents whatever(data, continuation and etc.) left in the tuple space

Logically, I don't think EventLog can get what the state want(items left in tuplespace).

After discussions with @tgrospic, I am convinced that we can get the state in each channel in the EventLog based on the post state in the block. With the state, we can merge two states.

After some investigations, I think currently rnode got the mergibilityRules for Rholang ONLY. But rnode also needs the MergeableRules for Rspace. That's what rnode are missing right now.(Or we can say that replaying the merging block on the main parent is our MergeableRules for Rspacenow which is very low speed when merging two long chain)

After the discover above, the rspace mergibilityRules should be simple (It was a little bit over-thinking in my mind before)

tgrospic commented 3 years ago

I am convinced that we can get the state in each channel in the EventLog based on the post state in the block. With the state, we can merge two states.

@zsluedem that's is correct. Hashes in event log can be reconstructed as channels with the use of the tuple space state which post-state hash in the block is pointing to.

zsluedem commented 3 years ago

I am convinced that we can get the state in each channel in the EventLog based on the post state in the block. With the state, we can merge two states.

@zsluedem that's is correct. Hashes in event log can be reconstructed as channels with the use of the tuple space state which post-state hash in the block is pointing to.

Is that what replay is doing?

zsluedem commented 3 years ago

Just want to share the conversation between Mike and @zsluedem here.

Will:

Hello Mike. Sorry to bother you again. I am still reading your mergeable table. In this case !C 4!! It would be like the below

    //  Mergeable case
    //                    MergedBlock()
    //           /                          \
    //          /                            \
    // B2 ->Rho("contract @0(@0) = { 0 }")   B3 ->Rho("@0!!(1)")
    //           \                          /
    //            \                        /
    //                  BaseBlock  -> Rho("for (@1 <- @0) { 0 } | @0!(0)")

BaseBlock contains the first letter ! and 4 in the tuplespace. Then B2 got C which match previous ! in base block and B3 got !! which match previous 4 I dont have problem with the example above. My question is that the case !C 4!! can also be re-ordered what is in the tuplespace which can cause a conflict case below.

    // Conflict
    // !C 4!!
    //                    MergedBlock
    //           /                          \
    //          /                            \
    // B2 ->Rho("contract @0(id) = { 0 } | @0!(0)")   B3 ->Rho("@0!!(1)")
    //           \                          /
    //            \                        /
    //                  BaseBlock  -> Rho("for (@1 <- @0) { 0 }")

Is that correct?

Mike:

!C 4!! does indeed indicate that the tuplespace contains a send and a receive, but they can't both be on the same channel. Otherwise, they would have interacted when one or the other was added. Since the parts of the base block being interacted with have to be on different channels, the case you're worried about can't occur.

So the base block looks like

for (@1 <- @0) { 0 } | @2!(0)

Then a contract is added in one block that interacts with the send and a replicated send in the other block that interacts with the for. So block 2 adds

contract @2(x) = { 0 }

while block 3 adds

@0!!(1)

Will: Sorry for the late reply.

!C 4!! does indeed indicate that the tuplespace contains a send and a receive, but they can't both be on the same channel

In the case of !C 4!!, I think is is possible that they on the same channel with not-successful match on consume side. Like my example demonstrate above -> for (@1 <- @0) { 0 } | @0!(0) this is a contract which happened on the same channel 0 but the receive doesn't match the send which both this send and receive can exist in the tuple space. That's my thoughts on your table https://docs.google.com/spreadsheets/d/1pABqArF9e8HRTO9zSefp93mIVUm91avekeDgqSEw0R8/edit#gid=0 . When I was reading the case 4! !4, your descriptions say that Both have their matches, fine choice. which can be also happened in !C 4!!.

Mike: I see, something like

for (@{x /\ Int} <- @0) { 0 } | @0!("hi")

Yes, that's possible; sorry I forgot about that case in my explanation.

B2 = contract @0(_) = { 0 }
B3 = @0!!(1)

"Both have their matches, fine choice": B2 consumes "hi", B3 interacts with the linear for.

BUT! then the contract and the replicated send form an infinite loop. So unless they're on different channels, the blocks can't be merged. Someone should review the whole spreadsheet again with that kind of example in mind.

That said, we don't have an encoding of each cell of the spreadsheet. Instead, Arturo came up with a small piece of code for determining whether two blocks are mergeable. I don't remember where it is, though. Sorry.

Will: But the contract and the replicated send can also be not-match which would not leads to infinite loop.

That said, we don't have an encoding of each cell of the spreadsheet

What do you mean encoding here? I don't understand.

Instead, Arturo came up with a small piece of code for determining whether two blocks are mergeable.

I think you are talking about this https://github.com/rchain/rchain/blob/dev/casper/src/main/scala/coop/rchain/casper/EstimatorHelper.scala and https://github.com/rchain/rchain/blob/dev/casper/src/main/scala/coop/rchain/casper/TuplespaceEvent.scala

Mike: I mean that there are not separate pieces of code for each cell in the spreadsheet. Instead, there's a single rule Arturo came up with that handles all the cases. I don't know if the rule is correct, but it's what is being used.

would not lead to an infinite loop

Yes. If the channels are the same, the problem is a lot more complex than if the channels are different.

Will: If the channels are different, no matter what operations happen on these channels, they are all mergeable. We only care about the operation on the same channel, right?

Mike: You are right. It looks like the spreadsheet has errors. I don't know if Arturo's code does.

zsluedem commented 3 years ago
    *               MergedBlock
    *             /            \
    *           b4 d8(Main)    b3 d7
    *       /         \    /       \
    *       b1  d1      b6 d3      b5 d6
    *       |  d2       |  d4      |  d5
    *         \        |          /
    *                baseBlock
    *

b1 contains d1, d2 b6 contains d3, d4 b5 contains d6, d5 b4 contains d8 b3 contains b3

b4 merged b1(Main) and b6 d3 conflict d1 d3 rejected b3 merged b6(Main) and b5 d5 conflict d3 d5 rejected

b4 applied d1 d2 d4 reject d4 b3 applied d3 d4 d6 reject d5

which deploys should we include in MergedBlock? mergingDeploys = d7 d6 (d5?) mainDeploys = d1 d2 d4 d8

detect conflict deploys between two branches mainDeploys and mergingDeploys

bitcard commented 3 years ago

mark