near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.33k stars 629 forks source link

Idea: change the encoding of the `TrieKey` to move the `AccountId` to before the column id. #12173

Open nagisa opened 1 month ago

nagisa commented 1 month ago

Description

https://near.zulipchat.com/#narrow/stream/295306-contract-runtime/topic/Experiment.3A.20move.20account_id.20to.20the.20beginning.20of.20encoded.20T.2E.2E.2E/near/472728401 records some investigations that have occurred here.

In summary changing the encoding of the TrieKeys as follows:

\x00{account_id} => \xff{account_id}\x00
\x01{account_id} => \xff{account_id}\x01
\x02{account_id} => \xff{account_id}\x02
\x09{account_id} => \xff{account_id}\x09

With this the trie structure changes such that the trie nodes encoding the sometimes quite large in size account_id are shared between the distinct columns related to accounts. In particular a contract function call will always touch columns 0 (Account) and 2 (AccessKey), so the state witness ends up generally encoding two full copies of the trie structure for each transaction signer.

Accounts that are called currently end up accessing all four columns, so they end up encoding 4 copies of the trie for account_id.

Moving to account_id-first encoding reduces this duplication substantially, as the account_id part of the TrieKey is then shared across any number of columns.


This is largely pretty difficult task as it changes the entire trie structure and requires a significant migration.

nagisa commented 1 month ago

An alternative encoding could be {0xff-account_id.len()}{account_id} -- this then allows all sorts of code to quickly jump to the byte offset that stores the column ID. Since the account_id length is limited to 64 that would still allow other columns to keep the column-id-first encoding (so long as there are fewer than 190 such columns.)