near / nearcore

Reference client for NEAR Protocol

https://near.org

GNU General Public License v3.0

2.31k stars 618 forks source link

[Project tracking] Remove obsolete storage optimisations #11912

Closed pugachAG closed 3 weeks ago

pugachAG commented 1 month ago

Having memtrie enabled makes some of the storage optimisations introduced to reduce latency for disk tries irrelevant.

Shard cache

Shard cache (see TrieCachingStorage) is and LRU-style in-memory cache for State. With memtrie everything is already read from memory, so it doesn't improve latency. Removing it would slightly improve storage read latency (since we avoid additional LRU map lookup) and reduce memory usage (up to 3GB per shard). We still want to have shard cache when memtrie is not enabled, for example for RPC nodes.

Issue: #11913

Prefetcher

Prefetcher was introduced to enable reading State in parallel with receipt execution. The main purpose was to deal with disk read latency which is no longer an issue with memtrie, so it can be disabled. Disabling prefetch reduces memory usage.

Flat storage deltas

Currently we store flat storage deltas in memory to make reads faster. Flat storage is not used at all with memtrie enabled, so we no longer need to do that. In practice this wouldn't reduce memory usage by a lot since we only store deltas for block in between chain head and the last final block, so this has a lower priority.

pugachAG commented 1 month ago

Disabling prefetcher experiment

Setup

2 identical n2d-highmem-16 mainnet nodes tracking all shards with memtrie enabled with the following config:

    "load_mem_tries_for_shards": [],
    "load_mem_tries_for_tracked_shards": true,

We compare near_applying_chunks_time metric for the control pugachag-dev and experiment pugachag-dev-exp nodes. We track experiment / control values for apply chunk time latency.

First we run both nodes with prefetch enabled to make sure that tracked values are close to 100%

Then we disable prefetcher in config:

    "enable_receipt_prefetching": false, 
    "sweat_prefetch_receivers": [], 
    "sweat_prefetch_senders": [], 
    "claim_sweat_prefetch_config": [], 
    "kaiching_prefetch_config": [],

and confirming that via near_prefetch_staged_slots metric.

Results

avg apply chunk latency is improved by 30-40%:

p95 latency also improves significantly:

p99 has slight improvements in most cases:

bowenwang1996 commented 1 month ago

@pugachAG should we also remove the accounting cache (chunk cache) as part of this effort?

pugachAG commented 1 month ago

@bowenwang1996 removing accounting cache is a bit more challenging since it affects gas costs. Not having it at all will make storage operations prohibitively expensive (cc @Longarithm to confirm that). We could also not charge TTN costs for reads and writes and then we no longer need accounting cache, but with stateless validation we do need that to account for contribution to state witness storage proof size. Also it looks like at this point accounting cache doesn't contribute that much to the apply chunk latency: see retroactively_account here.