Optimize query engine - Githubissues

subsquid / worker-rs

GNU Affero General Public License v3.0

0 stars 1 forks source link

We've tried using Apache DataFusion for the new query engine. On average, it is slower than DuckDB but it's much more hackable and extensible, which may be profitable for our use case. At least it allows us to build the expression trees directly instead of forming an SQL string and parsing it back.

Looking at how the query performance can be optimized, there are several approaches to try:

[x] Define strict schema for the parquet files
[x] Stream query results instead of full-batch processing
[x] Try different approaches for cross-table lookups
[ ] Use bloom filters to speed up queries for sparse data
[ ] Use json-writer to speed up conversion

subsquid / worker-rs

Optimize query engine #2