nlesc-sigs / data-sig

Linked data, data & modeling SIG
Other
5 stars 3 forks source link

Optimise range queries #30

Closed arnikz closed 4 years ago

arnikz commented 5 years ago

Discuss indexing strategies (i.e., data structures and algorithms) for range queries implemented by relational or graph DBMS (e.g., R-Tree in SQLite)

LourensVeen commented 5 years ago

For PostgreSQL, B-Tree indexes and possibly a CLUSTER ON should do the trick. Things get tricky if you've got range constraints on multiple columns. If one is very selective, then adding an index to that one should suffice, if none of them are selective, then life gets difficult. Does PostgreSQL do multiple threads in a single query these days or is that still an EnterpriseDB-only feature?

arnikz commented 5 years ago

Yes, PostgreSQL supports parallel queries. However, the (standard) B-Tree indices are not so optimal for these type of queries AFAIK.

LourensVeen commented 5 years ago

Well, they sort the primary keys on the index expression, so if that's what you're querying then everything you need should be packed together in memory, except of course for the final lookup step which is a gather operation. Clustering on the index would make that better as well, but you can only cluster on one index. There's GIST for multidimensional range queries, but I'm not sure how generally usable it is. Works well for PostGIS though.

Do you have a specific problem in mind?

romulogoncalves commented 5 years ago

It will be scheduled for a future data-sig session. It would be nice to have an use-case to look at it. @arnikz will provide some examples.

c-martinez commented 4 years ago

@arnikz -- This issue is quite old by now. I will close it, but please open it again if there is something you would like to share with the SIG on this topic.