pokt-network / pocket

Official implementation of the Pocket Network Protocol v1
https://pokt.network
MIT License
61 stars 33 forks source link

[Utility] Address the discrepancy between the state trees and txIndexer #875

Closed h5law closed 11 months ago

h5law commented 12 months ago

Objective

The state trees are the "source of truth" and as such should be used for verification purposes. Currently when attempting to add a transaction to the mempool, a check to see whether it has already been committed to state and indexed is carried out. This is important, however the check is done against the TxIndexer not the Transactions state tree.

See: utility/transaction.go

// HandleTransaction implements the exposed functionality of the shared utilityModule interface.
func (u *utilityModule) HandleTransaction(txProtoBytes []byte) error {
    txHash := coreTypes.TxHash(txProtoBytes)

    // Is the tx already in the mempool (in memory)?
    if u.mempool.Contains(txHash) {
        return coreTypes.ErrDuplicateTransaction()
    }
-       // Is the tx already committed & indexed (on disk)?
-       if txExists, err := u.GetBus().GetPersistenceModule().TransactionExists(txHash); err != nil {
-               return err
-       } else if txExists {
-               return coreTypes.ErrTransactionAlreadyCommitted()
-       }
+       // Is the tx included in the Transactions state tree?
+       root, nodeStore := u.GetBus().GetTreeStore().GetTree(trees.TransactionTreeName)
+       lazy := smt.ImportSparseMerkleTree(nodeStore, sha256.New(), root)
+        proof, err := lazy.Prove(txHash)
+        if err != nil {
+                return err
+        }
+        txHashBz, err := hex.DecodeString(txHash)
+        if err != nil {
+                return err
+        }
+        if valid := smt.VerifyProof(proof, root, txHashBz, txProtoBytes, lazy.Spec()); valid {
+                return coreTypes.ErrTransactionAlreadyCommitted()
+        }
+        // Validate the Transactions tree was used to calculate the root hash
+        stateHash, nodeStore := u.GetBus().GetTreeStore().GetTree(trees.RootTreeName)
+       lazy := smt.ImportSparseMerkleTree(nodeStore, sha256.New(), root)
+        proof, err := lazy.Prove([]byte(trees.TransactionsTreeName))
+        if err != nil {
+                return err
+        }
+        if valid := smt.VerifyProof(proof, stateHash, []byte(trees.TransactionsTreeName), root, lazy.Spec()); valid {
+                return errors.New("malicious transactions tree provided - not part of previous state hash")
+        }

        // Can the tx be decoded?
        tx := &coreTypes.Transaction{}
@@ -20,6 +38,7 @@ func (u *utilityModule) HandleTransaction(txProtoBytes []byte) error {
                return coreTypes.ErrProtoUnmarshal(err)
        }
        // Does the tx pass basic validation?
    if err := tx.ValidateBasic(); err != nil {
        return err
    }

    // Store the tx in the mempool
    return u.mempool.AddTx(txProtoBytes)
}

By using the Transactions state tree in persistence/trees/trees.go we can guarentee the inclusion or exclusion of any element. There may be particular instances where the TxIndexer does not have the correct representation of the values in the state tree and as such may lead to invalid transactions entering the mempool - or not allowing valid transactions to enter the mempool.

By relying on a secondary data structure that is not the source of truth we are leading to potential areas where the system can be gamed. The transition towards using the state trees for proofs and verification will result in a stronger and more secure network.

Origin Document

https://www.notion.so/pocketnetwork/2023-06-30-33d9d44c9432406cb0126a1e40bf8cbf?pvs=4#95d544c38bb14d3d806fb1ea6a043b07

Protocol Hour: 30/06/2023

Screenshot 2023-07-04 at 15 50 49

Goals

Deliverable

Non-goals / Non-deliverables

General issue deliverables

Testing Methodology


Creator: @h5law Co-Owners: @Olshansk

h5law commented 12 months ago

@dylanlott Wdyt of this method?

Olshansk commented 12 months ago

@h5law It's much easier to understand the difference proposed if you represent it as a diff instead of two long code snippet. Wdyt of:

// HandleTransaction implements the exposed functionality of the shared utilityModule interface.
func (u *utilityModule) HandleTransaction(txProtoBytes []byte) error {
    txHash := coreTypes.TxHash(txProtoBytes)

    // Is the tx already in the mempool (in memory)?
    if u.mempool.Contains(txHash) {
        return coreTypes.ErrDuplicateTransaction()
    }
-       // Is the tx already committed & indexed (on disk)?
-       if txExists, err := u.GetBus().GetPersistenceModule().TransactionExists(txHash); err != nil {
-               return err
-       } else if txExists {
-               return coreTypes.ErrTransactionAlreadyCommitted()
-       }
+       // Is the tx included in the Transactions state tree?
+       root, nodeStore := u.GetBus().GetTreeStore().GetTree(trees.TransactionTreeName)
+       lazy := smt.ImportSparseMerkleTree(nodeStore, sha256.New(), root)
+        proof, err := lazy.Prove(txHash)
+        if err != nil {
+                return err
+        }
+        txHashBz, err := hex.DecodeString(txHash)
+        if err != nil {
+                return err
+        }
+        if valid := smt.VerifyProof(proof, root, txHashBz, txProtoBytes, lazy.Spec()); valid {
+                return coreTypes.ErrTransactionAlreadyCommitted()
+        }
+        // Validate the Transactions tree was used to calculate the root hash
+        stateHash, nodeStore := u.GetBus().GetTreeStore().GetTree(trees.RootTreeName)
+       lazy := smt.ImportSparseMerkleTree(nodeStore, sha256.New(), root)
+        proof, err := lazy.Prove([]byte(trees.TransactionsTreeName))
+        if err != nil {
+                return err
+        }
+        if valid := smt.VerifyProof(proof, stateHash, []byte(trees.TransactionsTreeName), root, lazy.Spec()); valid {
+                return errors.New("malicious transactions tree provided - not part of previous state hash")
+        }

        // Can the tx be decoded?
        tx := &coreTypes.Transaction{}
@@ -20,6 +38,7 @@ func (u *utilityModule) HandleTransaction(txProtoBytes []byte) error {
                return coreTypes.ErrProtoUnmarshal(err)
        }
        // Does the tx pass basic validation?
    if err := tx.ValidateBasic(); err != nil {
        return err
    }

    // Store the tx in the mempool
    return u.mempool.AddTx(txProtoBytes)
}

A few concerns I have with the proposed approach:

  1. Decoupling responsibility: Utility is responsible for the utility of the Pocket blockchain (i.e. relays). This exposes persistence (i.e. integrity) business logic outside of it. Example approach to overcome this: Just update the implementation of TransactionExists and keep the utility code the same

  2. Performance: The point of an indexer (i.e. indexers in general) is to build something that is a view into the database but operates more efficiently. This new approach will lead to a performance overhead in the future, but it's too early to determine if it's an issue or not and no reason to optimize prematurely. What I'd like to know is: why have a tx indexer in the first place then?

  3. References: Could you please find how Tendermint does it and link to it? We can learn a lot from them and I want to see if they check the trees on ever tx handle as well.

h5law commented 12 months ago

@Olshansk I do want to note that although I cannot find how tendermint/cometbft does this I am still a firm believer that leveraging the state trees where possible will add a strong layer of security to the network. As you mentioned our TxIndexer is currently being used as a source of truth and its not even a true representation of the state tree (its not the database underlying the tree I mean). I think just as we spoke previously about seperating commitment and retrieval we should think again about what we are doing here. We are not retrieving the Tx but instead we are checking it has been committed to state, for this we need a proof. We can look into optimisations around our proving times but personally I think this is the right call. If we don't do this I think we must reevaluate our position on our sources of truth as the trees are merely hashing functions for us currently.

It's much easier to understand the difference proposed if you represent it as a diff

Updated the original issue with the diff 👍🏼

Just update the implementation of TransactionExists and keep the utility code the same

Yeah this was more what I was thinking. By no means should the code I wrote above be the final implementation I mentioned putting this in a helper function which would probably best sit inside persistence.

This new approach will lead to a performance overhead in the future

This is something we can benchmark and come to a seperate conclusion about. I am probably with you in the fact that the trees may be slower, but as the indexer grows in size the tree traversal may be quicker unsure. Ultimately this is not a performance fix though this is more aligned with our chains validity.

why have a tx indexer in the first place then?

Good question. Currently our use of the TxIndexer goes against our saying of "The trees are the only source of truth" in reality this should be: "The trees are the source of the state hash and not used to determine the truth at all, for that we have the TxIndexer, PostgresDB and BlockStore..." which I am not a fan of.

Currently my knowledge of where else the TxIndexer is being used is lacking but here I am firmly against its use. I would not be able to say whether we need to keep it or remove it entirely and this discussion would be better placed in a seperate issue when there is more context built around how it is currently being used and is there a better way?

As we now maintain the SMT we can always look into native support for things like batch proof generation and ics23 commitment proof types which would potentially speed up proving times a lot.

Could you please find how Tendermint does it and link to it

I am currently searching through cometbft/cometbft, cosmos/cosmos-sdk and cosmos/gaia and having a really hard time searching the codebases for any logic related to this stuff. Once I manage to find it I will update here.

dylanlott commented 12 months ago

@dylanlott Wdyt of this method?

I agree that the TxIndexer obscures the source-of-truth aspect of the TreeStore. It causes an impedance mismatch in transaction handling between components in the Persistence module currently, as well as a layer of asynchronous indirection when creating a proposal block, because the HandleTransaction function updates both Postgres and the TxIndexer's stores, but then the subsequent TreeStore reads in a different module happen after that. This indirection also creates the false positives and false negatives in transaction validation between the two stores.

If they occurred in the same module, block proposal would be simpler to reason about in the Unit of Work. It would start a savepoint, reap the mempool, insert all the transactions into the trees, build the index from what went into the trees, then store the full block in the blockstore, and if all that succeeds, it would update Postgres through the persistenceRWContext. This approach would treat Postgres as a reflection of the TreeStore and ensure that the trees are the ultimate source of truth, since nothing can get into Postgres without first getting into the trees.

Current Design

classDiagram
    PersistenceModule <|-- TreeStore
    PersistenceModule <|-- BlockStore
    PersistenceModule <|-- TxIndexer
     class TreeStore{        
        kvstore.KVStore
        Update()
    }
    class BlockStore{
        kvstore.KVStore
        StoreBlock()
    }
    class TxIndexer{
        kvstore.KVStore
        IndexTransaction()
    }

Possible Alternative

classDiagram
    PersistenceModule <|-- TreeStore
    PersistenceModule <|-- BlockStore
     class TreeStore{        
        kvstore.KVStore
        IndexTransaction()
    }
    class BlockStore{
        kvstore.KVStore
        StoreBlock()
    }

I know that your deliverable explicitly says that removing the TxIndexer is a non-goal but I'm not so sure about that after spending the morning thinking about this.

Olshansk commented 12 months ago

@h5law

By no means should the code I wrote above be the final implementation I mentioned putting this in a helper function which would probably best sit inside persistence.

Cool, just add those qualifiers in the future. It'll help out potential readers joining the conversation or looking back at history.

This is something we can benchmark and come to a seperate conclusion about.

Make one of the deliverables in this ticket to create an M5 or M6 to document the reasoning why benchmarking this is important and leaving the appropriate comment.

The trees are the source of the state hash and not used to determine the truth at all, for that we have the TxIndexer, PostgresDB and BlockStore..." which I am not a fan of.

Aligned 🤝

I am currently searching through cometbft/cometbft, cosmos/cosmos-sdk and cosmos/gaia and having a really hard time searching the codebases for any logic related to this stuff. Once I manage to find it I will update here.

Just to help you get started, I suggest looking at func (blockExec *BlockExecutor) CreateProposalBlock( in state/execution.go. I haven't had a chance to dive deep, but I believe the downstream calls of the business logic here will uncover a lot of answers:

// CreateProposalBlock calls state.MakeBlock with evidence from the evpool
// and txs from the mempool. The max bytes must be big enough to fit the commit.
// The block space is first allocated to outstanding evidence.
// The rest is given to txs, up to the max gas.
//
// Contract: application will not return more bytes than are sent over the wire.
func (blockExec *BlockExecutor) CreateProposalBlock(
    ctx context.Context,
    height int64,
    state State,
    lastExtCommit *types.ExtendedCommit,
    proposerAddr []byte,
) (*types.Block, error) {

    maxBytes := state.ConsensusParams.Block.MaxBytes
    emptyMaxBytes := maxBytes == -1
    if emptyMaxBytes {
        maxBytes = int64(types.MaxBlockSizeBytes)
    }

    maxGas := state.ConsensusParams.Block.MaxGas

    evidence, evSize := blockExec.evpool.PendingEvidence(state.ConsensusParams.Evidence.MaxBytes)

    // Fetch a limited amount of valid txs
    maxDataBytes := types.MaxDataBytes(maxBytes, evSize, state.Validators.Size())
    maxReapBytes := maxDataBytes
    if emptyMaxBytes {
        maxReapBytes = -1
    }

    txs := blockExec.mempool.ReapMaxBytesMaxGas(maxReapBytes, maxGas)
    commit := lastExtCommit.ToCommit()
    block := state.MakeBlock(height, txs, commit, evidence, proposerAddr)
    rpp, err := blockExec.proxyApp.PrepareProposal(
        ctx,
        &abci.RequestPrepareProposal{
            MaxTxBytes:         maxDataBytes,
            Txs:                block.Txs.ToSliceOfBytes(),
            LocalLastCommit:    buildExtendedCommitInfo(lastExtCommit, blockExec.store, state.InitialHeight, state.ConsensusParams.ABCI),
            Misbehavior:        block.Evidence.Evidence.ToABCI(),
            Height:             block.Height,
            Time:               block.Time,
            NextValidatorsHash: block.NextValidatorsHash,
            ProposerAddress:    block.ProposerAddress,
        },
    )
    if err != nil {
        // The App MUST ensure that only valid (and hence 'processable') transactions
        // enter the mempool. Hence, at this point, we can't have any non-processable
        // transaction causing an error.
        //
        // Also, the App can simply skip any transaction that could cause any kind of trouble.
        // Either way, we cannot recover in a meaningful way, unless we skip proposing
        // this block, repair what caused the error and try again. Hence, we return an
        // error for now (the production code calling this function is expected to panic).
        return nil, err
    }

    txl := types.ToTxs(rpp.Txs)
    if err := txl.Validate(maxDataBytes); err != nil {
        return nil, err
    }

    return state.MakeBlock(height, txl, commit, evidence, proposerAddr), nil
}

I know that your deliverable explicitly says that removing the TxIndexer is a non-goal but I'm not so sure about that after spending the morning thinking about this.

@dylanlott I'd support okay with removing the TxIndexer given two additions:

  1. This does not get in the way of the savepoint / rollback work (we finish that off first)
  2. We add some benchmark unit tests (see https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go) as part of this deliverable.
  3. SMT proof verification is abstracted from the utility module
Olshansk commented 12 months ago

FYI @adshmh. This issue has a lot of context you might find of value

dylanlott commented 12 months ago

@dylanlott I'd support okay with removing the TxIndexer given two additions: This does not get in the way of the savepoint / rollback work (we finish that off first)

Aligned 🤝 I filed a draft ADR that elaborates, but I agree, it isn't necessary to complete savepoints and rollbacks right now and should be put in the proper place in priority queue before it's tackled. It is likely best categorized as techdebt.

Olshansk commented 11 months ago

Making a reminder for us to transfer the notes from today's protocol hour to this issue when someone has bandwidth. Lots of good discussion. Audio recording is in the notion as well.