pokt-network / poktroll

The official Shannon upgrade implementation of the Pocket Network Protocol implemented using the Cosmos SDK
MIT License
16 stars 8 forks source link

[Bug] UnbondSupplier with 0 balance causes consensus failure #841

Closed okdas closed 2 weeks ago

okdas commented 1 month ago

Objective

There's an issue on main branch that can cause a chain halt.

Origin Document

10:29PM INF Stored block hash at height 70 EndBlock=SessionModuleEndBlock module=x/session
10:29PM INF found 0 expiring claims at block height 70 method=SettlePendingClaims module=x/tokenomics
10:29PM INF settled 0 and expired 0 claims at block height 70 method=SettlePendingClaims module=x/tokenomics
10:29PM INF settled 0 claims and expired 0 claims method=EndBlocker module=x/tokenomics
10:29PM INF successfully updated the relay mining difficulty for 0 services method=EndBlocker module=x/tokenomics
10:29PM ERR could not send 0upokt coins from module supplier to account pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj due to 0upokt: invalid coins method=UnbondSupplier module=x/supplier
10:29PM ERR CONSENSUS FAILURE!!! err="runtime error: invalid memory address or nil pointer dereference" module=consensus stack="goroutine 191 [running]:\nruntime/debug.Stack()\n\t/opt/homebrew/Cellar/go/1.23.1/libexec/src/runtime/debug/stack.go:26 +0x64\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:801 +0x4c\npanic({0x3ea9f40?, 0x7079230?})\n\t/opt/homebrew/Cellar/go/1.23.1/libexec/src/runtime/panic.go:785 +0xf0\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock.func1()\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/cosmos-sdk@v0.50.9/baseapp/abci.go:860 +0x124\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock(0x4000298d88, 0x4002d7f680)\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/cosmos-sdk@v0.50.9/baseapp/abci.go:892 +0x374\ngithub.com/cosmos/cosmos-sdk/server.cometABCIWrapper.FinalizeBlock({{0xffff72f58d10, 0x4000a5d308}}, {0x524a248, 0x713f6e0}, 0x4002d7f680)\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/cosmos-sdk@v0.50.9/server/cmt_abci.go:44 +0x54\ngithub.com/cometbft/cometbft/abci/client.(*localClient).FinalizeBlock(0x400327b3e0, {0x524a248, 0x713f6e0}, 0x4002d7f680)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/abci/client/local_client.go:185 +0xf8\ngithub.com/cometbft/cometbft/proxy.(*appConnConsensus).FinalizeBlock(0x4003231518, {0x524a248, 0x713f6e0}, 0x4002d7f680)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/proxy/app_conn.go:104 +0x1d0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).applyBlock(_, {{{0xb, 0x0}, {0x4001cabd19, 0x7}}, {0x4001cabd50, 0x8}, 0x1, 0x45, {{0x400321ed40, ...}, ...}, ...}, ...)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/state/execution.go:224 +0x3c0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).ApplyVerifiedBlock(_, {{{0xb, 0x0}, {0x4001cabd19, 0x7}}, {0x4001cabd50, 0x8}, 0x1, 0x45, {{0x400321ed40, ...}, ...}, ...}, ...)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/state/execution.go:202 +0xd8\ngithub.com/cometbft/cometbft/consensus.(*State).finalizeCommit(0x40013ca008, 0x46)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1772 +0xd50\ngithub.com/cometbft/cometbft/consensus.(*State).tryFinalizeCommit(0x40013ca008, 0x46)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1682 +0x2c0\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit.func1()\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1617 +0xb8\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit(0x40013ca008, 0x46, 0x0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1655 +0xd90\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0x40013ca008, 0x40011e1520, {0x0, 0x0})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:2335 +0x26c0\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0x40013ca008, 0x40011e1520, {0x0, 0x0})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:2067 +0x50\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0x40013ca008, {{0x5213ae0, 0x4001e14030}, {0x0, 0x0}})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:929 +0x5c0\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0x40013ca008, 0x0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:856 +0x5fc\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 1\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:398 +0x1e4\n"
10:29PM INF service stop impl=baseWAL module=consensus msg="Stopping baseWAL service" wal=/root/.poktroll/data/cs.wal/wal
10:29PM INF service stop impl=Group module=consensus msg="Stopping Group service" wal=/root/.poktroll/data/cs.wal/wal

Steps to reproduce

  1. Start Localnet.
  2. Execute many requests: hey -n 10000 -c 2 -m POST -H "Content-Type: application/json" -d '{"method":"eth_blockNumber","params":[],"id":1,"jsonrpc":"2.0"}' http://localhost:42069/anvil.
  3. Look at the logs as supplier balance drains. When it hits 0 the exception will appear.

Goals

Deliverables

Non-goals / Non-deliverables

General deliverables


Creator: @okdas Co-Owners: TBD

okdas commented 1 month ago

Seems to be a similar but a different issue.


12:17AM INF Stored block hash at height 650 EndBlock=SessionModuleEndBlock module=x/session
12:17AM INF found 1 expiring claims at block height 650 method=SettlePendingClaims module=x/tokenomics
12:17AM INF claim does not require proof due to claimed amount (78750upokt) being less than the threshold (20000000upokt) and random sample (0.75) being greater than probability (0.25) method=proofRequirementForClaim module=server
12:17AM INF Claim by supplier pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj IS WITHIN LIMITS of servicing application pokt1mrqt5f7qh8uxs27cjm9t7v9e74a9vvdnq5jva4. Max claimable amount >= Claim amount: 6666601310upokt >= 78750 application=pokt1mrqt5f7qh8uxs27cjm9t7v9e74a9vvdnq5jva4 claim_settlement_upokt=78750 helper=ensureClaimAmountLimits method=ProcessTokenLogicModules module=x/tokenomics num_claim_compute_units=1875 num_relays=1875 service_id=anvil session_id=6b8e37440824da10f528cc54fb4766bfbcc760bf1f16433a05124a3689c18db0 supplier_operator=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
12:17AM INF About to start processing TLMs for (1875) compute units, equal to (78750upokt) claimed actual_settlement_upokt=78750upokt application=pokt1mrqt5f7qh8uxs27cjm9t7v9e74a9vvdnq5jva4 claim_settlement_upokt=78750 method=ProcessTokenLogicModules module=x/tokenomics num_claim_compute_units=1875 num_relays=1875 service_id=anvil session_id=6b8e37440824da10f528cc54fb4766bfbcc760bf1f16433a05124a3689c18db0 supplier_operator=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
12:17AM INF Starting TLM processing: "TLMRelayBurnEqualsMint" actual_settlement_upokt=78750upokt application=pokt1mrqt5f7qh8uxs27cjm9t7v9e74a9vvdnq5jva4 claim_settlement_upokt=78750 method=ProcessTokenLogicModules module=x/tokenomics num_claim_compute_units=1875 num_relays=1875 service_id=anvil session_id=6b8e37440824da10f528cc54fb4766bfbcc760bf1f16433a05124a3689c18db0 supplier_operator=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
12:17AM INF sent 78750upokt from the supplier module to the supplier shareholder with address "pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj" method=distributeSupplierRewardsToShareHolders module=x/tokenomics
12:17AM INF distributed 78750 uPOKT to supplier "pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj" shareholders method=distributeSupplierRewardsToShareHolders module=x/tokenomics
12:17AM ERR error processing token logic modules for claim "6b8e37440824da10f528cc54fb4766bfbcc760bf1f16433a05124a3689c18db0": TLM "TLMRelayBurnEqualsMint": burning 78750upokt from the application module account: spendable balance 19662upokt is smaller than 78750upokt: insufficient funds [cosmos/cosmos-sdk@v0.50.9/x/bank/keeper/send.go:278]: failed to burn uPOKT from application module account [/Users/dk/gopath/pkg/mod/cosmossdk.io/errors@v1.0.1/errors.go:155]: failed to process TLM [/Users/dk/gopath/pkg/mod/cosmossdk.io/errors@v1.0.1/errors.go:155] module=server num_claim_compute_units=1875 num_relays_in_session_tree=1875 proof_requirement=NOT_REQUIRED session_id=6b8e37440824da10f528cc54fb4766bfbcc760bf1f16433a05124a3689c18db0 supplier_operator_address=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
12:17AM ERR could not settle pending claims due to error TLM "TLMRelayBurnEqualsMint": burning 78750upokt from the application module account: spendable balance 19662upokt is smaller than 78750upokt: insufficient funds [cosmos/cosmos-sdk@v0.50.9/x/bank/keeper/send.go:278]: failed to burn uPOKT from application module account [/Users/dk/gopath/pkg/mod/cosmossdk.io/errors@v1.0.1/errors.go:155]: failed to process TLM [/Users/dk/gopath/pkg/mod/cosmossdk.io/errors@v1.0.1/errors.go:155] method=EndBlocker module=x/tokenomics
12:17AM INF Closing application.db module=server
12:17AM INF Closing snapshots/metadata.db module=server
okdas commented 1 month ago

Replicated to gather logs.

Note:

appgate.txt relayminer.txt validator.txt

okdas commented 1 month ago

Another similar but different issue:


7:47PM INF Stored block hash at height 590 EndBlock=SessionModuleEndBlock module=x/session
7:47PM INF found 1 expiring claims at block height 590 method=SettlePendingClaims module=x/tokenomics
7:47PM INF claim requires proof due to random sample (0.11) being less than or equal to probability (0.25) method=proofRequirementForClaim module=server
7:47PM INF claim expired due to PROOF_MISSING module=server num_claim_compute_units=1871 num_relays_in_session_tree=1871 proof_requirement=PROBABILISTIC session_id=25a2006d70a05959561149ae36edb13396115ddf20db0a11c2ac83e5aee2fd33 supplier_operator_address=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
7:47PM INF deleted claim with primary key 25a2006d70a05959561149ae36edb13396115ddf20db0a11c2ac83e5aee2fd33/pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj for supplier pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj and session 25a2006d70a05959561149ae36edb13396115ddf20db0a11c2ac83e5aee2fd33 method=RemoveClaim module=x/proof
7:47PM ERR error slashing supplier pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj: spendable balance 1000068upokt is smaller than 320000000upokt: insufficient funds module=server num_claim_compute_units=1871 num_relays_in_session_tree=1871 proof_requirement=PROBABILISTIC session_id=25a2006d70a05959561149ae36edb13396115ddf20db0a11c2ac83e5aee2fd33 supplier_operator_address=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
7:47PM ERR could not settle pending claims due to error spendable balance 1000068upokt is smaller than 320000000upokt: insufficient funds [cosmos/cosmos-sdk@v0.50.9/x/bank/keeper/send.go:278] method=EndBlocker module=x/tokenomics
7:47PM ERR CONSENSUS FAILURE!!! err="runtime error: invalid memory address or nil pointer dereference" module=consensus stack="goroutine 191 [running]:\nruntime/debug.Stack()\n\t/opt/homebrew/Cellar/go/1.23.1/libexec/src/runtime/debug/stack.go:26 +0x64\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:801 +0x4c\npanic({0x3ea9f40?, 0x7079210?})\n\t/opt/homebrew/Cellar/go/1.23.1/libexec/src/runtime/panic.go:785 +0xf0\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock.func1()\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/cosmos-sdk@v0.50.9/baseapp/abci.go:860 +0x124\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock(0x40011b2b48, 0x400527b2c0)\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/cosmos-sdk@v0.50.9/baseapp/abci.go:892 +0x374\ngithub.com/cosmos/cosmos-sdk/server.cometABCIWrapper.FinalizeBlock({{0xffff5048a0f8, 0x4000bc0008}}, {0x524a288, 0x713f6c0}, 0x400527b2c0)\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/cosmos-sdk@v0.50.9/server/cmt_abci.go:44 +0x54\ngithub.com/cometbft/cometbft/abci/client.(*localClient).FinalizeBlock(0x4001f1cd80, {0x524a288, 0x713f6c0}, 0x400527b2c0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/abci/client/local_client.go:185 +0xf8\ngithub.com/cometbft/cometbft/proxy.(*appConnConsensus).FinalizeBlock(0x4001382d80, {0x524a288, 0x713f6c0}, 0x400527b2c0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/proxy/app_conn.go:104 +0x1d0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).applyBlock(_, {{{0xb, 0x0}, {0x4002c060f9, 0x7}}, {0x4002c06b80, 0x8}, 0x1, 0x24d, {{0x4006be47c0, ...}, ...}, ...}, ...)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/state/execution.go:224 +0x3c0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).ApplyVerifiedBlock(_, {{{0xb, 0x0}, {0x4002c060f9, 0x7}}, {0x4002c06b80, 0x8}, 0x1, 0x24d, {{0x4006be47c0, ...}, ...}, ...}, ...)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/state/execution.go:202 +0xd8\ngithub.com/cometbft/cometbft/consensus.(*State).finalizeCommit(0x40012dbc08, 0x24e)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1772 +0xd50\ngithub.com/cometbft/cometbft/consensus.(*State).tryFinalizeCommit(0x40012dbc08, 0x24e)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1682 +0x2c0\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit.func1()\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1617 +0xb8\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit(0x40012dbc08, 0x24e, 0x0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1655 +0xd90\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0x40012dbc08, 0x4004f58270, {0x0, 0x0})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:2335 +0x26c0\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0x40012dbc08, 0x4004f58270, {0x0, 0x0})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:2067 +0x50\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0x40012dbc08, {{0x5213b20, 0x4005033df8}, {0x0, 0x0}})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:929 +0x5c0\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0x40012dbc08, 0x0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:856 +0x5fc\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 1\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:398 +0x1e4\n"
7:47PM INF service stop impl=baseWAL module=consensus msg="Stopping baseWAL service" wal=/root/.poktroll/data/cs.wal/wal
7:47PM INF service stop impl=Group module=consensus msg="Stopping Group service" wal=/root/.poktroll/data/cs.wal/wal

Attaching relayminer logs - though nothing there seems relevant.

relayminer_3.txt

bryanchriswhite commented 1 month ago

@okdas Regarding the origin document in the description: it seems to me that what's happening is the supplier module account has a 0 balance (as opposed to any individual supplier).

  1. It's not yet clear to me how the supplier module account balance is reaching zero unless something is wrong with minting or distribution. Perhaps the supplier's address is not included in the rev_share_percent (see our docs; note the warning)?
  2. It seems like the issue happens when a supplier tries to unbond while the supplier module account balance is 0. It's interesting that it seems like the stake amount that unbonding is trying to send back to the supplier is also 0. I'm not sure how a supplier could wind up being staked with 0 stake but that would be consistent with the lack of tokens in the supplier module account. :thinking:
bryanchriswhite commented 1 month ago

Perhaps the supplier's address is not included in the rev_share_percent (see our docs; note the warning)

This 👆 doesn't seem to be the case, given what's in config.yml. :thinking:

Could it be that the supplier is being slashed to 0?

okdas commented 1 month ago

Could it be that the supplier is being slashed to 0?

Just replicated and checked the balances (apps, suppliers, modules) - they didn't seem to go to 0. It's possible they might have and the RPC only provides a presentation of the data from the previous (correct, working) block. 🤔

okdas commented 1 month ago

Just replicated this on main branch, the bug is still active:


{"level":"error","module":"server","session_id":"cdc88b54d7daec21b72d6eb64434a132b633f426478b3a9150e50d9b72d2ace3","supplier_operator_address":"pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj","num_claim_compute_units":894,"num_relays_in_session_tree":894,"num_estimated_compute_units":894,"claimed_upokt":"37548upokt","proof_requirement":"NOT_REQUIRED","time":"2024-10-15T01:24:06Z","message":"error processing token logic modules for claim \"cdc88b54d7daec21b72d6eb64434a132b633f426478b3a9150e50d9b72d2ace3\": TLM \"TLMRelayBurnEqualsMint\": burning 37548upokt from the application module account: spendable balance 28650upokt is smaller than 37548upokt: insufficient funds [cosmos/cosmos-sdk@v0.50.9/x/bank/keeper/send.go:278]: failed to burn uPOKT from application module account [/Users/dk/go/pkg/mod/cosmossdk.io/errors@v1.0.1/errors.go:155]: failed to process TLM [/Users/dk/go/pkg/mod/cosmossdk.io/errors@v1.0.1/errors.go:155]"}
{"level":"error","module":"server","module":"x/tokenomics","method":"EndBlocker","time":"2024-10-15T01:24:06Z","message":"could not settle pending claims due to error TLM \"TLMRelayBurnEqualsMint\": burning 37548upokt from the application module account: spendable balance 28650upokt is smaller than 37548upokt: insufficient funds [cosmos/cosmos-sdk@v0.50.9/x/bank/keeper/send.go:278]: failed to burn uPOKT from application module account [/Users/dk/go/pkg/mod/cosmossdk.io/errors@v1.0.1/errors.go:155]: failed to process TLM [/Users/dk/go/pkg/mod/cosmossdk.io/errors@v1.0.1/errors.go:155]"}
{"level":"error","module":"server","module":"consensus","err":"runtime error: invalid memory address or nil pointer dereference","stack":"goroutine 195 [running]:\nruntime/debug.Stack()\n\t/opt/homebrew/Cellar/go/1.23.2/libexec/src/runtime/debug/stack.go:26 +0x64\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:801 +0x4c\npanic({0x3f0b5a0?, 0x710a990?})\n\t/opt/homebrew/Cellar/go/1.23.2/libexec/src/runtime/panic.go:785 +0xf0\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock.func1()\n\t/Users/dk/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.50.9/baseapp/abci.go:860 +0x124\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock(0x4000384fc8, 0x400507ba40)\n\t/Users/dk/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.50.9/baseapp/abci.go:892 +0x374\ngithub.com/cosmos/cosmos-sdk/server.cometABCIWrapper.FinalizeBlock({{0xffff3a72c660, 0x4001c0b908}}, {0x52b6368, 0x71d17e0}, 0x400507ba40)\n\t/Users/dk/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.50.9/server/cmt_abci.go:44 +0x54\ngithub.com/cometbft/cometbft/abci/client.(*localClient).FinalizeBlock(0x40012898c0, {0x52b6368, 0x71d17e0}, 0x400507ba40)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/abci/client/local_client.go:185 +0xf8\ngithub.com/cometbft/cometbft/proxy.(*appConnConsensus).FinalizeBlock(0x4000d00018, {0x52b6368, 0x71d17e0}, 0x400507ba40)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/proxy/app_conn.go:104 +0x1d0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).applyBlock(_, {{{0xb, 0x0}, {0x4001a73a08, 0x7}}, {0x4001a73a30, 0x8}, 0x1, 0xb3, {{0x40045e1640, ...}, ...}, ...}, ...)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/state/execution.go:224 +0x3c0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).ApplyVerifiedBlock(_, {{{0xb, 0x0}, {0x4001a73a08, 0x7}}, {0x4001a73a30, 0x8}, 0x1, 0xb3, {{0x40045e1640, ...}, ...}, ...}, ...)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/state/execution.go:202 +0xd8\ngithub.com/cometbft/cometbft/consensus.(*State).finalizeCommit(0x4001b00008, 0xb4)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1772 +0xd50\ngithub.com/cometbft/cometbft/consensus.(*State).tryFinalizeCommit(0x4001b00008, 0xb4)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1682 +0x2c0\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit.func1()\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1617 +0xb8\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit(0x4001b00008, 0xb4, 0x0)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:1655 +0xd90\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0x4001b00008, 0x4006d8d110, {0x0, 0x0})\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:2335 +0x26c0\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0x4001b00008, 0x4006d8d110, {0x0, 0x0})\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:2067 +0x50\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0x4001b00008, {{0x527f8c0, 0x40052ba518}, {0x0, 0x0}})\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:929 +0x5c0\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0x4001b00008, 0x0)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:856 +0x5fc\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 1\n\t/Users/dk/go/pkg/mod/github.com/cometbft/cometbft@v0.38.10/consensus/state.go:398 +0x1e4\n","time":"2024-10-15T01:24:06Z","message":"CONSENSUS FAILURE!!!"}
{"level":"info","module":"server","module":"consensus","wal":"/root/.poktroll/data/cs.wal/wal","msg":"Stopping baseWAL service","impl":"baseWAL","time":"2024-10-15T01:24:06Z","message":"service stop"}
okdas commented 3 weeks ago

I was working on a service that consumes events and there were a couple of events I randomly caught before the node died:


EventDataNewBlock: Block 6fdf1ec07a57e68138457d91391d764f5a95e30dc8b0c06dd10a02081a2bad9d - Height: 110 
Attribute: Key=claim, Value={"supplier_operator_address":"pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj","session_header":{"application_address":"pokt1mrqt5f7qh8uxs27cjm9t7v9e74a9vvdnq5jva4","service_id":"anvil","session_id":"0f33ddca0526c75ad52a107cb9a2188f9312e123752896be75cd0d310f63a397","session_start_block_height":"91","session_end_block_height":"100"},"root_hash":"EjMu2HvB00rrqkZfcm069xE+wyxetUQs09C30yvK40cAAAAAAAAAYwAAAAAAAABj"}
Attribute: Key=claimed_upokt, Value={"denom":"upokt","amount":"4158"}
Attribute: Key=expiration_reason, Value="PROOF_MISSING"
Attribute: Key=num_claimed_compute_units, Value="99"
Attribute: Key=num_estimated_compute_units, Value="99"
Attribute: Key=num_relays, Value="99"
Attribute: Key=mode, Value="EndBlock"
^ poktroll.tokenomics.EventClaimExpired
Attribute: Key=num_expired_claims, Value="1"
Attribute: Key=slashing_amount, Value={"denom":"upokt","amount":"1000068"}
Attribute: Key=supplier_operator_addr, Value="pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj"
Attribute: Key=mode, Value="EndBlock"
^: poktroll.tokenomics.EventSupplierSlashed
EventDataNewBlock: Block fd552373a5131034c9767e5f5a8d2407e4ad53ceaa3d6473bad3c167464576fc - Height: 111 

119 was the last height before the validator had a nil pointer reference:


{"level":"info","module":"server","module":"state","height":119,"block_app_hash":"7CC5EE44F5C6CBB9E5FA4B51B61A9C610DB86614599C8CEA2BF6FA567A97A872","time":"2024-10-18T22:23:15Z","message":"committed state"}
{"level":"info","module":"server","module":"txindex","height":119,"time":"2024-10-18T22:23:15Z","message":"indexed block events"}
{"level":"info","module":"server","module":"consensus","dur":1979.909217,"height":120,"round":0,"step":"RoundStepNewHeight","time":"2024-10-18T22:23:17Z","message":"Timed out"}
{"level":"info","module":"server","module":"consensus","proposal":"Proposal{120/0 (09B71CA43583505BD48E198076DE9F7562674DDB56A6BB78FF38D4475352F896:1:6ECA0745CAFE, -1) B3135B062E2C @ 2024-10-18T22:23:17.175477048Z}","proposer":"CD585EC8B5BA4408462CB95B7B3937E11135D887","time":"2024-10-18T22:23:17Z","message":"received proposal"}
{"level":"info","module":"server","module":"consensus","height":120,"hash":"09B71CA43583505BD48E198076DE9F7562674DDB56A6BB78FF38D4475352F896","time":"2024-10-18T22:23:17Z","message":"received complete proposal block"}
{"level":"info","module":"server","module":"consensus","height":120,"hash":"09B71CA43583505BD48E198076DE9F7562674DDB56A6BB78FF38D4475352F896","root":"36B2306696BDE03FF0B117A62E14DB625F87F49F4CC69347D0EBBB9A3E2B2D69","num_txs":0,"time":"2024-10-18T22:23:17Z","message":"finalizing commit of block"}
{"level":"info","module":"server","module":"x/session","EndBlock":"SessionModuleEndBlock","time":"2024-10-18T22:23:17Z","message":"Stored block hash at height 120"}
{"level":"info","module":"server","module":"x/tokenomics","method":"SettlePendingClaims","time":"2024-10-18T22:23:17Z","message":"found 0 expiring claims at block height 120"}
{"level":"info","module":"server","module":"x/tokenomics","method":"SettlePendingClaims","time":"2024-10-18T22:23:17Z","message":"settled 0 and expired 0 claims at block height 120"}
{"level":"info","module":"server","module":"x/tokenomics","method":"EndBlocker","time":"2024-10-18T22:23:17Z","message":"settled 0 claims and expired 0 claims"}
{"level":"info","module":"server","module":"x/tokenomics","method":"EndBlocker","time":"2024-10-18T22:23:17Z","message":"successfully updated the relay mining difficulty for 0 services"}
{"level":"error","module":"server","module":"x/supplier","method":"UnbondSupplier","time":"2024-10-18T22:23:17Z","message":"could not send 0upokt coins from module supplier to account pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj due to 0upokt: invalid coins"}
Olshansk commented 3 weeks ago

@red-0ne Could you provide an update on whether this is fixed or not?

red-0ne commented 3 weeks ago

@red-0ne Could you provide an update on whether this is fixed or not?

This will be investigated this week. Was planning to have Path integrated first, but I think I'll do without it.

Olshansk commented 3 weeks ago

@red-0ne Let's prioritize this as P0 because it's a MUCH smaller effort and is blocking @okdas + launch to Beta TestNet.

okdas commented 2 weeks ago

This issue has been addressed - thank you @red-0ne