questdb / questdb

QuestDB is an open source time-series database for fast ingest and SQL queries
https://questdb.io
Apache License 2.0
14.3k stars 1.15k forks source link

Significantly improve geospatial query performance by geospatial clustering #2967

Open foosmate opened 1 year ago

foosmate commented 1 year ago

Is your feature request related to a problem?

With worldwide traffic data, the latest by query performance of small areas (e.g. Northsea) takes up to 10s. Especially opening cold partitions is extremely slow because the index grows too much.

Our tests have shown that the query performance drops from 10s to 50ms when manually distributing the data to 32 tables tiling based on geohash(1c).

image

image

Describe the solution you'd like.

As soon as QuestDB supports sharded tables, it would be great to allow the geohash operation to be used as a special key for sub-partitioning the data.

CREATE TABLE positions (
  time TIMESTAMP,
  id SYMBOL,
  mmsi INT,
  lat FLOAT,
  lon FLOAT,
  geo6 geohash(6c),
  geo12 geohash(12c)
) TIMESTAMP(time) PARTITION BY HOUR SHARD BY GEOHASH(geo6, 1c);

Describe alternatives you've considered.

No response

Additional context.

No response

igor-suhorukov commented 1 year ago

H3 Hierarchical Spatial Index partitioning also works perfect for this task in PostgreSQL/Spark Partitioning technique for OSM world and approach overview

marregui commented 1 year ago

There will be a first round at this for type SYMBOL, soon, as per the public road map https://github.com/orgs/questdb/projects/1/views/5?pane=issue&itemId=20369114

addu390 commented 2 weeks ago

Is SHARD BY on the road-map? Only supporting partitioning by time is what is holding me back from using QuestDB!

nwoolmer commented 2 weeks ago

It is, see above.

addu390 commented 2 weeks ago

It is, see above.

Looks like it's in Q4 of 2024. Is the current recommendation by QuestDB to create multiple tables (of same schema) as a work-around or is there a better way?

nwoolmer commented 2 weeks ago

That's the usual route, and adding the symbol value to the table name to distinguish them.