near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.31k stars 615 forks source link

feat: add possibility to save trie nodes while re-applying blocks #11811

Closed Trisfald closed 1 month ago

Trisfald commented 1 month ago

Adds a new argument to apply and apply range commands to save computed trie nodes into hot or cold storage. Normally, this operation should be idempotent, but it can also be used to reconstruct missing trie nodes.

Trisfald commented 1 month ago

Tested with the following procedure:

  1. Verify that

    curl -X POST http://35.204.169.85:3030 \
        -H "Content-Type: application/json" \
        -d '
        { "id": "dontcare", "jsonrpc": "2.0", "method": "query", "params": { "account_id": "b001b461c65aca5968a0afab3302a5387d128178c99ff5b2592796963407560a", "block_id": 109913260, "request_type": "view_account" } }'

    returns an error

  2. Stop neard

  3. Verify that

    ./neard view-state -t cold view-trie --shard-id 2 --shard-version 1 --max-depth 1000 --hash 36SkUU8tgetUtVL2a5JPwKB6F29yKBFjF5PFukZ8HRFH --from b001aea591ef68681e59a4149b1ab8bc56d8f22e34be24 --to b001c0de4c6929c5289b65044249830466ffea27680bc1 --format pretty --record-type account

    Gives MissingTrieValue error

  4. Run

    ./neard view-state -t cold --readwrite apply-range --start-index 109913255 --end-index 109913260 --shard-id 2 --storage trie-free --save-trie cold sequential
  5. Now step 3 doesn't fail anymore

  6. Start neard

  7. Step 1 returns the same result as read RPC

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 49.62963% with 136 lines in your changes missing coverage. Please review.

Project coverage is 71.68%. Comparing base (55d0df0) to head (4ca19f4).

Files Patch % Lines
core/store/src/db/recoverydb.rs 67.36% 47 Missing :warning:
tools/state-viewer/src/commands.rs 18.86% 42 Missing and 1 partial :warning:
tools/state-viewer/src/cli.rs 0.00% 30 Missing :warning:
tools/state-viewer/src/apply_chain_range.rs 72.97% 8 Missing and 2 partials :warning:
core/store/src/lib.rs 0.00% 6 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #11811 +/- ## ========================================== - Coverage 71.72% 71.68% -0.05% ========================================== Files 803 804 +1 Lines 163763 163996 +233 Branches 163763 163996 +233 ========================================== + Hits 117466 117564 +98 - Misses 41231 41362 +131 - Partials 5066 5070 +4 ``` | [Flag](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | Coverage Δ | | |---|---|---| | [backward-compatibility](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `0.23% <0.00%> (-0.01%)` | :arrow_down: | | [db-migration](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `0.23% <0.00%> (-0.01%)` | :arrow_down: | | [genesis-check](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `1.34% <0.00%> (-0.01%)` | :arrow_down: | | [integration-tests](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `37.75% <0.00%> (-0.10%)` | :arrow_down: | | [linux](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `71.45% <49.62%> (-0.04%)` | :arrow_down: | | [linux-nightly](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `71.27% <49.62%> (-0.04%)` | :arrow_down: | | [macos](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `54.31% <49.62%> (-0.01%)` | :arrow_down: | | [pytests](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `1.61% <0.00%> (-0.01%)` | :arrow_down: | | [sanity-checks](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `1.41% <0.00%> (-0.01%)` | :arrow_down: | | [unittests](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `66.12% <49.62%> (-0.03%)` | :arrow_down: | | [upgradability](https://app.codecov.io/gh/near/nearcore/pull/11811/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `0.28% <0.00%> (-0.01%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Trisfald commented 1 month ago

I'm very confused :)

You're storing the trie changes (DBCol::TrieChanges) but as far as I know this column is only used for garbage collection and cold migration. In particular I don't think it's used for answering rpc or viewing state with neard subcommands. I thought you'll be storing the trie nodes directly into the DBCol::State column which is where I believe the data is missing. That being said it seems like it worked somehow :) Can you have a look into how this works and explain it briefly for me?

Yes, I'm saving the trie changes explicitly although I believe it isn't strictly required. From my understanding trie changes isn't the only column touched by this section of code https://github.com/near/nearcore/blob/master/chain/chain/src/store/mod.rs#L2533-L2550, for example insertions into DBCol::State and changes to memtries.

Trisfald commented 1 month ago

Updated PR:

TODO:

Trisfald commented 1 month ago

On the positive side: recovered data is still present after leaving the node running over the weekend. Regenerating trie nodes is idempotent.

Not so positive: apply-range does not work at all for many ranges of blocks after 1st and 2nd resharding. The reason is that "missing trie values" break the apply stage of transactions. We'll need to proceed step by step and redo resharding before replaying blocks.

Overall I haven't found any inherent issue with this PR