Open ctb opened 2 years ago
found this comment from @bluegenes, buried in a different issue - it appears to be the original we-should-have-tax-in-zip idea -
Additional thought: It would be handy to include the taxonomy file inside each database file (possible with
zip
,sbt.zip
, andsqldb
and not needed forlca
, right?). That would reduce extra download code and the need to link the correct taxonomy file with each database. For taxonomy functions with official databases, users could provide the database on the command line (instead of needing to find/download the taxonomy file), and we could automatically find it. I would imagineTAXONOMY.csv
, complementary to manifest file. We would still allow alternate taxonomies, of course, but at least each db would come with the official set for that db?
also ref https://github.com/nf-core/taxprofiler/pull/404, where it would clearly be nice to have just one file containing sketches + taxonomy CSV.
In https://github.com/sourmash-bio/sourmash/issues/2154, we've been talking about how to include taxonomic information in zipfiles, and I've been trying to figure out how that would work at the command line.
But all the discussion happened in a now-closed issue and a now-merged PR ;). So here's a new issue!
Comments copied over from various other issues and PRs -
From https://github.com/sourmash-bio/sourmash/pull/2195#issuecomment-1213275883, @bluegenes:
From https://github.com/sourmash-bio/sourmash/issues/2012#issuecomment-1214419578, I wrote:
which received @bluegenes endorsement:
Also sorta connects with https://github.com/sourmash-bio/sourmash/issues/2186, searching/selecting on taxonomic lineages?