Closed arnetheduck closed 1 month ago
This does not work well when there are gaps among keys, e.g., due to []
values which are not supported inside MPT (setting a key to []
is equivalent to deleting the key from the MPT, as in, the key is not existing at all instead of being set to an empty value).
See https://github.com/ethereum/consensus-specs/pull/3885 which now finally disallows this for transactions, but there may still be other uses for having gaps in the keys.
if the gap case is intentionally unsupported, there should be an assert / comment / log to make this case explicit. the old logic (before ordered trie) was working correctly with present gaps.
Root encoding is on the hot path for block verification both in the consensus (when syncing) and execution clients and oddly constitutes a significant part of resource usage even though it is not that much work.
While the trie code is capable of producing a transaction root and similar feats, it turns out that it is quite inefficient - even for small work loads.
This PR brings in a helper for the specific use case of building tries of lists of values whose key is the RLP-encoded index of the item.
As it happens, such keys follow a particular structure where items end up "almost" sorted, with the exception for the item at index 0 which gets encoded as
[0x80]
, ie the empty list, thus moving it to a new location.Armed with this knowledge and the understanding that inserting ordered items into a trie easily can be done with a simple recursion, this PR brings a ~100x improvement in CPU usage (360ms vs 33s) and a ~50x reduction in memory usage (70mb vs >3gb!) for the simple test of encoding 1000000 keys.
In part, the memory usage reduction is due to a trick where the hash of the item is computed as the item is being added instead of storing it in the value.
There are further reductions possible such as maintaining a hasher per level instead of storing hash values as well as using a direct-to-hash rlp encoder.