sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
463 stars 78 forks source link

add an `lca convert` command #1961

Open ctb opened 2 years ago

ctb commented 2 years ago

In #1808 I'm adding a LCA_SqliteDatabase class that supports on-disk LCA databases. These can be built directly with sourmash lca index ... -F sql, but it is also very easy to convert them from JSON to SQL programmatically with something like:

from sourmash.lca.lca_db import load_single_database
db, ksize, scaled = load_single_database(filename)
db.save(newfilename, format='sql')

We could/should build a new command-line command, sourmash lca convert, that does this conversion; it would be nice if it had some diagnostic output :)

We could also support export of lineage spreadsheets per https://github.com/sourmash-bio/sourmash/issues/1080 with the same command.

ctb commented 2 years ago

as a side note, we should verify that sourmash LCA SQLite databases can be loaded as taxonomy dbs.

ctb commented 2 years ago

err, oops, AND that lca.json files can be loaded as taxonomy DBs.