paritytech / substrate-archive

Blockchain Indexing Engine
GNU General Public License v3.0
196 stars 73 forks source link

Speed up Storage Indexing #52

Closed insipx closed 4 years ago

insipx commented 4 years ago

There are some optimizations which can be done in order to speed up current storage indexing.

Ideas:

From RocksDB Wiki:

There is a lot of complexity in the underlying RocksDB implementation to lookup a key. The complexity results in a lot of computational overhead, mainly due to cache misses when probing bloom filters, virtual function call dispatches, key comparisons and IO. Users that need to lookup many keys in order to process an application level request end up calling Get() in a loop to read the required KVs. By providing a MultiGet() API that accepts a batch of keys, it is possible for RocksDB to make the lookup more CPU efficient by reducing the number of virtual function calls and pipelining cache misses. Furthermore, latency can be reduced by doing IO in parallel.

Let us consider the case of a workload with good locality of reference. Successive point lookups in such a workload are likely to repeatedly access the same SST files and index/data blocks. For such workloads, MultiGet provides the following optimizations -

https://github.com/facebook/rocksdb/wiki/MultiGet-Performance

This seems like it fits the use-case of storage queries particularly well. We are both getting lots of keys, and then looking up the data for that key individually

Blocked on:

insipx commented 4 years ago

Solved by re-executing the block and getting storage changes this way. We no longer query for storage from stored changes in RocksDB. This proved to be a faster way to index storage, and less intense then modifying the rocksdb stack starting with rust-rocksdb. It was also ambiguous whether MultiGet would actually offer improved speeds over gets via iteration.