sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
476 stars 79 forks source link

MRG: panic when `FSStorage::load_sig` encounters more than one `Signature` in a JSON record #3333

Closed ctb closed 3 weeks ago

ctb commented 2 months ago

This PR was originally about debugging https://github.com/sourmash-bio/sourmash_plugin_branchwater/pull/445, but that's going to require more work to fix properly. For now, I would like to nominate it for merge because sourmash fails silently in this situation, and that's Bad.

In brief, the main thing this PR does is panic with an unimplemented! when FSStorage::load_sig encounters more than one Signature in a JSON record.

This PR also adds a bit of documentation to InnerStorage, per the bottom of this comment.


The problem at hand: when loading a SigStore/Signature from a Storage, sourmash only loads the first one and ignores any others.

https://github.com/sourmash-bio/sourmash/blob/26b50f3e3566006fd6356a4f8b4d47c5e381aeec/src/core/src/storage/mod.rs#L34-L38

This results from the concept of a Signature as containing one or more sketches; the history of this is described here, and it leads to some interesting silliness in the Python layer.

The contrapositive is that, in Rust, a single Signature can include multiple sketches, e.g. with different ksizes. So this works fine for the wort case where we have a single .sig file with k=21, k=31, k51.

Note that the Python layer (and hence the entire sourmash CLI) fully supports multiple Signatures in JSON: this is well tested and well covered behavior. The branchwater plugin runs into it because it is using the Rust layer and the API is not fully fleshed out there.


codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 86.46%. Comparing base (f707db4) to head (e0726c8). Report is 1 commits behind head on latest.

Files with missing lines Patch % Lines
src/core/src/storage/mod.rs 85.71% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## latest #3333 +/- ## ========================================== - Coverage 86.48% 86.46% -0.03% ========================================== Files 137 137 Lines 16085 16089 +4 Branches 2219 2219 ========================================== Hits 13911 13911 - Misses 1867 1871 +4 Partials 307 307 ``` | [Flag](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3333/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | Coverage Δ | | |---|---|---| | [hypothesis-py](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3333/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | `25.43% <ø> (ø)` | | | [python](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3333/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | `92.40% <ø> (ø)` | | | [rust](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3333/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | `62.19% <85.71%> (-0.08%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ctb commented 4 weeks ago

@luizirber @bluegenes thoughts welcome!

ctb commented 3 weeks ago

@luizirber bump!

ctb commented 3 weeks ago

🙏