Runtime: enable key prefix scan

robert-zaremba commented 9 months ago

Objective

Currently, the NEAR runtime only exposes key value operations to the smart contracts, without being able to do native key prefix scan.

Motivation

Doing a prefix scan, or scan from a particular key index is an often use case in many smart contracts:

query tokens with pagination (from token_id, limit)
query any storage object with pagination (from object_id, limit): proposals, votes, posts, comments, ...

Having an efficient prefix scan API is important to many use cases that involve complex queries within smart contract, or cross contract.

Existing solutions (and their limitations)

NEAR SDK

The workaround is to implement additional data structures on top of the key value operations. NEAR SDK does it by providing TreeMap, which implements AVL Tree. That introduces a lot of burden on the storage level (unnecessary storage bloat and many indirect operations), because we implement a tree (smart contract AVL container) on top of a tree (Merkle Tree) on top of a tree (KV databases are usually implemented as a sorted b-trees under the hood). Moreover, AVL, while being technically elegant may introduce many expensive reorganizations (certain smart contract operations may be way more expensive than others).

We could make it 3-4 times more gas efficient if we can remove that extra overhead.

Indexer

Another alternative is to use indexer. This is often a preferred solution for a high performance DAPPs. However it won't work for smart contracts, which will need to read a sequence of data (internally or doing cross contract queries).

Proposed solutions

The NEAR storage is organized in a Merkle Tree. That structure provides unprecedented benefit: sorted key prefix scan.

The motivation to not expose key prefix scan was to be more independent from the underlying store (Merkle Tree) and have a flexibility to change it in the future, without being blocked by the compatibility with the existing runtime API.

If we want to abstract from the Merkle Tree, then we can overcome the limitation by providing a layer to directly query KV database (without going through the Merkle Tree or any future abstraction) . Dominant KV databases already provide key prefix scans.

Let's use this Issue to discuss potential NEP.

frol commented 9 months ago

Iteration over storage keys was part of the initial nearcore implementation, but then it was deprecated due to security considerations:

robert-zaremba commented 8 months ago

The tree growth is a classic problem and was well studied in Ethereum world. There are already good solutions for that, starting from separating store for queries, storage cost adjustments and few other patterns that go-ethereum did in last 2 years, key compression etc...

In Cosmos we had an interesting project for the tree compression, which didn't reach implementation phase due to other priorities.

Classic hash maps may have similar problems and were a source of few exploits.

robert-zaremba commented 8 months ago

TL;DR: I don't see how we solved the problem. Instead we implement a tree on top of a tree on top of a tree to solve the iteration use case. This adds a lot of round trips and often kills the native cache.

All state of the art solutions for store commitment include a tree, at least at the contract level (see also Verkle Trees).

robert-zaremba commented 4 months ago

Moreover, all good DBs that I tested for blockchain store backend (and I tested many in Cosmos), provide prefix scan functionality.

near / NEPs