I implemented a tile-based indexing strategy for beddb which can speed up queries by up to 20x at the expense of increasing the file size by a factor of ~2.5x
To avoid adding the burden of having to handle another format to the end-user I decided to mark this indexing using an appended t to the version number. I.e., version 3 is the normal version while 3t is the tile-index version
To create a tile-indexed beddb file use clodius aggregate bedfile with --tile-index.
Why is it necessary?
The range-based rtree indexing is getting slow with >5mio intervals (i.e., 0.5s for a query) while the tile-based index remains fast with ~0.025s.
Description
What was changed in this pull request?
I implemented a tile-based indexing strategy for beddb which can speed up queries by up to 20x at the expense of increasing the file size by a factor of ~2.5x
To avoid adding the burden of having to handle another format to the end-user I decided to mark this indexing using an appended
t
to the version number. I.e., version3
is the normal version while3t
is the tile-index versionTo create a tile-indexed beddb file use
clodius aggregate bedfile
with--tile-index
.Why is it necessary?
The range-based rtree indexing is getting slow with >5mio intervals (i.e.,
0.5
s for a query) while the tile-based index remains fast with ~0.025
s.Checklist
black .