sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
476 stars 79 forks source link

MRG: standardize on u32 for scaled, and introduce `ScaledType` #3364

Closed ctb closed 2 weeks ago

ctb commented 3 weeks ago

Makes scaled a u32, and change a few others as well.

This PR started because we were mixing u32 and u64 in places, but I think a switch away from usize (architecture specific & mostly returned from collection lengths, and so on) to explicit u32/u64 seems good.

Fixes https://github.com/sourmash-bio/sourmash/issues/3363

codecov[bot] commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 86.45%. Comparing base (122c4b7) to head (5f49d1c). Report is 1 commits behind head on latest.

Files with missing lines Patch % Lines
src/core/src/ffi/index/revindex.rs 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## latest #3364 +/- ## ========================================== - Coverage 86.45% 86.45% -0.01% ========================================== Files 137 137 Lines 16090 16089 -1 Branches 2219 2219 ========================================== - Hits 13911 13910 -1 Misses 1872 1872 Partials 307 307 ``` | [Flag](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3364/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | Coverage Δ | | |---|---|---| | [hypothesis-py](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3364/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | `25.43% <ø> (ø)` | | | [python](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3364/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | `92.40% <ø> (ø)` | | | [rust](https://app.codecov.io/gh/sourmash-bio/sourmash/pull/3364/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio) | `62.16% <93.33%> (-0.02%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=sourmash-bio#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ctb commented 3 weeks ago

Ready for review @luizirber

luizirber commented 3 weeks ago

This PR started because we were mixing u32 and u64 in places, but I think a switch away from usize (architecture specific & mostly returned from collection lengths, and so on) to explicit u32/u64 seems good.

Initially u32 started to get into the API because wasm is 32-bits nowadays, so some function calls were easier to set up across JS/Rust. But also based on the scaled values we actually use (1 -> ~100k) u64 is way too big, u16 is too small, so... u32 it is?

(ksize should probably be u16, has anyone ever used k > 65535? :joy_cat: )

I would have no problem standardizing scaled on u32, too. Just want it to be one type of number :).

Maybe we can do a type alias like HashIntoType for scaled: https://github.com/sourmash-bio/sourmash/blob/6ae9cd32d9ac2841c98415bc1c043597536a5470/src/core/src/lib.rs#L55 would likely still need some conversions but can be done with as HashIntoType. (And easier to play out with changing the type too and evaluating consequences, instead of having to fix everywhere in the codebase)

ctb commented 3 weeks ago

sounds good!

ctb commented 2 weeks ago

Ready for review @luizirber !