higlass / clodius

Clodius is a tool for breaking up large data sets into smaller tiles that can subsequently be displayed using an appropriate viewer.
MIT License
38 stars 21 forks source link

Flekschas/faster beddb #135

Open flekschas opened 3 years ago

flekschas commented 3 years ago

Description

What was changed in this pull request?

I implemented a tile-based indexing strategy for beddb which can speed up queries by up to 20x at the expense of increasing the file size by a factor of ~2.5x

To avoid adding the burden of having to handle another format to the end-user I decided to mark this indexing using an appended t to the version number. I.e., version 3 is the normal version while 3t is the tile-index version

To create a tile-indexed beddb file use clodius aggregate bedfile with --tile-index.

Why is it necessary?

The range-based rtree indexing is getting slow with >5mio intervals (i.e., 0.5s for a query) while the tile-based index remains fast with ~0.025s.

Checklist