Open olgabot opened 3 months ago
you are exactly right... they are not yet supported but rather desperately needed (see https://github.com/sourmash-bio/sourmash_plugin_branchwater/issues/266 and https://github.com/sourmash-bio/sourmash_plugin_branchwater/issues/235).
there are a few issues that are likely to take priority over upgrading this behavior - in particular, https://github.com/sourmash-bio/sourmash_plugin_branchwater/issues/322 and https://github.com/sourmash-bio/sourmash_plugin_branchwater/issues/331 are top of my mind right now - but your use case is really important functionality that we hope to implement soon.
(and yes, I think the documentation is also broken around this behavior. To quote Napoleon, “You can ask me for anything you like, except time” 😭 )
Hello, hope you are well!
I am very excited to try out the low-memory and fast searches created by RocksDB :) (Also, I will definitely be making use of
pairwise
!)On my way there, I encountered some unexpected behavior. I had an enormous sequence file (e.g. UniRef50, 65M protein sequences) and cut it up into chunks of 100k sequences to do
sourmash scripts manysketch -p protein,scaled=1,k=10,abund
without running out of resources.Then, I wanted to index these many files before searching them, but
sourmash scripts index
didn't work on a list of manifest files.Here's a minimal reproduction, using the data in
src/python/tests/test-data
:Then,
sourmash scripts index
failsI'm realizing now that
short.zip
are manifests and not sigs, but I was confused thatsourmash scripts index
wasn't able to work with them, because all the parameters matched when doingsourmash sig describe
:The workaround is using
sourmash sig cat
to combine the signatures into one file, but I was hoping not to do this until index creation since the input files are so big.Let me know if I'm not thinking about this problem correctly and there's a better way to do it.
Hope this was informative! Thank you!