sourmash-bio / sourmash_plugin_branchwater

fast, multithreaded sourmash operations: search, compare, and gather.
GNU Affero General Public License v3.0
16 stars 3 forks source link

error message: "not implemented: only one Signature currently allowed when using 'load_sig'" #501

Open ctb opened 2 weeks ago

ctb commented 2 weeks ago

As of sourmash_branchwater_plugin v0.9.9, this error message will be returned in certain circumstances - the easiest way to trigger it is to use a pathlist that points at a .sig/.sig.gz file that contains more than one Signature.

This was never supported, but previous plugin releases did the wrong thing silently and simply loaded the first compatible sketch encountered. The new error message in v0.9.9 results from a change in sourmash (https://github.com/sourmash-bio/sourmash/pull/3333) that flags the situation and displays an error.

The solution, for now, is to convert any and all .sig/.sig.gz files into .sig.zip files (by using, for example, sourmash sig cat sig_file -o file.sig.zip).

Here is a more detailed description of the problem from https://github.com/sourmash-bio/sourmash/pull/3333:

The problem at hand: when loading a SigStore/Signature from a Storage, sourmash only loads the first one and ignores any others.

https://github.com/sourmash-bio/sourmash/blob/26b50f3e3566006fd6356a4f8b4d47c5e381aeec/src/core/src/storage/mod.rs#L34-L38

This results from the concept of a Signature as containing one or more sketches; the history of this is described here, and it leads to some interesting silliness in the Python layer.

The contrapositive is that, in Rust, a single Signature can include multiple sketches, e.g. with different ksizes. So this works fine for the wort case where we have a single .sig file with k=21, k=31, k51.

Note that the Python layer (and hence the entire sourmash CLI) fully supports multiple Signatures in JSON: this is well tested and well covered behavior. The branchwater plugin runs into it because it is using the Rust layer and the API is not fully fleshed out there.

ctb commented 2 weeks ago

https://github.com/sourmash-bio/sourmash_plugin_branchwater/pull/445 explores it in more technical detail, along with potential fixes, but does not (yet) fix it.