sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
471 stars 80 forks source link

storing signatures/sketches in a sqlite DB #1807

Closed ctb closed 2 years ago

ctb commented 2 years ago

For tangential reasons, I ended up writing storage and retrieval code for storing actual sketches (incl minhash values) in a SQLite database.

See https://github.com/ctb/2022-sourmash-sqlite.

It seems to work for flat sketches - the following roundtrip does the right thing, at any rate:

./save-mh-to-sqlite.py test-db/all.zip -o all.db
./load-mh-from-sqlite.py all.db -o all.zip

reproduces all the signatures from test-db/all.zip in all.zip, including protein etc.

I still need to add abundance tracking tho. Oops.

Will link in related issues later.

ctb commented 2 years ago

Future TODO items: