sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
476 stars 79 forks source link

`Selection::from_record` does not respect `scaled` #3384

Closed ctb closed 1 week ago

ctb commented 1 week ago

In selection.rs, Selection::from_record does not pull the scaled value in from the Record:

https://github.com/sourmash-bio/sourmash/blob/e86c8a87b7984229f98037a9b0124baaebf608d8/src/core/src/selection.rs#L122-L131

Among perhaps other side effects, this means that Collection::sig_from_record does not return an appropriately downsampled signature.

This confusion was the source of a bug in branchwater - https://github.com/sourmash-bio/sourmash_plugin_branchwater/issues/505.

ctb commented 1 week ago

Interestingly we apparently figured this out before, because @bluegenes noted it in a test 😆

https://github.com/sourmash-bio/sourmash/blob/e86c8a87b7984229f98037a9b0124baaebf608d8/src/core/src/collection.rs#L479-L483

ctb commented 1 week ago

Tracking it down a bit more, it looks like maybe this is intentional - Collection::select delegates to Manifest::select which merely selects rows with compatible scaled values, but does not actually set the scaled value in the Record.

https://github.com/sourmash-bio/sourmash/blob/e86c8a87b7984229f98037a9b0124baaebf608d8/src/core/src/manifest.rs#L280-L283

ctb commented 1 week ago

The contortions I'm going through over in https://github.com/sourmash-bio/sourmash_plugin_branchwater/pull/504/ make me think that we definitely want to provide a way to request downsampling on sig_from_record 😆

It felt related to https://github.com/sourmash-bio/sourmash_plugin_branchwater/issues/501 to me, too, but after rereading everything I'm not sure. Sigh. Complexities.

ctb commented 1 week ago

per slack, luiz speaketh:

I think this is a straight bug, should be setting scaled and num too!

(later conversation: we're not sure about num ;))

ctb commented 1 week ago

Fixed by #3387.