scverse / genomic-features

Genomic Features in Python from BioConductor's AnnotationHub
https://genomic-features.readthedocs.io
BSD 3-Clause "New" or "Revised" License
19 stars 5 forks source link

Feat: Use duckdb as backend instead of sqlite #59

Closed thomas-reimonn closed 6 months ago

thomas-reimonn commented 6 months ago

This PR adds a backend (default: sqlite, optional: duckdb). Also it forces ordering by columns requested in order.

Benchmarking:

Running tests using DuckDB backend pytest 40.40s user 7.76s system 196% cpu 24.544 total Running tests using Sqlite3 pytest 39.02s user 10.77s system 97% cpu 51.043 total

DuckDB is twice as fast because it used two cores, but about the same number of cpu cycles.

thomas-reimonn commented 6 months ago

Resolves issue #58

ivirshup commented 6 months ago

It looks like duckdb isn't consistent about the ordering even when it's called the same way, I think that's going to cause bugs in peoples code.

I'd suggest that we order the results before returning them.

I also want to discuss scope of this a bit, but will continue that in #58

ivirshup commented 6 months ago

I'm going to merge this an open an issue to discuss ordering