questdb / roadmap

QuestDB Public Roadmap
6 stars 0 forks source link

Sub-partitioning by symbols #45

Open puzpuzpuz opened 1 year ago

puzpuzpuz commented 1 year ago

QuestDB partitions time-series data by time. However, with large amounts of data combined with many symbols (think, time series), queries such as SAMPLE BY or LATEST BY for a specific symbol can sometimes be slow. Adding an additional partitioning strategy for single or multiple symbol columns would boost query performance significantly. The table with a given symbol would then be virtualized and accessed a lot faster than otherwise.

This feature would support WAL tables only.

As an example, users could partition the table by times, such as hour as well as product_id column. Then if they run a SAMPLE BY query for a single product_id, only a single sub-partition would be accessed by the database yielding a significant query speed improvement over index-based access on a non-sub-partitioned table.

In the next version of this feature, we're going to add support for sub-partitioning by geo-hash column(s) in order to solve this issue: https://github.com/questdb/questdb/issues/2967

nwoolmer commented 1 year ago

+1 for this feature when it comes.

It would be useful for my use case if the secondary partitions by symbol are able to efficiently have different 'current times' in the time series.

For example, Symbol A and Symbol B exist. Symbol A can receive data at today's current time. Symbol B can receive data on a 3 day delay. So each individual time series has its own 'current time' and isn't subject to the current overhead of splitting and squashing partitions.

This behaviour can currently be emulated by making duplicate tables for each symbol to partition the data.