sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
473 stars 80 forks source link

can we change `sourmash sig describe` to use manifests? #1846

Open ctb opened 2 years ago

ctb commented 2 years ago

looking at the code, probably, yes! #1837 would help with this.

background and motivation: for large .sig.gz files, it is annoying to load all of the JSON if you just want specific moltypes/ksizes/etc. so in various places I have been building .zip files instead so we can just load the one signature we want based on manifests.

but then it occurred to me that for things like describe, we have most (all?) of the information present in the manifest.

so this would be much faster for databases with manifests!

ctb commented 2 years ago

alas, no, we cannot - manifests do not contain seed or license information!

If and when we upgrade or refactor manifest content, these will be something to add, I think.

ctb commented 2 years ago

adding sum_hashes in https://github.com/sourmash-bio/sourmash/pull/1882, which is also not present in manifests.

on the flip side, I also added --include-db-pattern and --exclude-db-pattern to sig describe in #1882, which I think makes it less important to use manifests for things - sig describe is mostly intended to be for humans, who wouldn't want to look at all the output for gazillions of signatures anyway.