sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
474 stars 80 forks source link

differentiate between mutable and immutable `MinHash` objects? #1494

Closed ctb closed 3 years ago

ctb commented 3 years ago

As I write more/deeper code around search and prefetch and so on, I am starting worry about accidentally modifying MinHash objects.

In brief -

related to the idea of diversifying MinHash objects to differentiate between num and scaled objects https://github.com/dib-lab/sourmash/issues/1354

there's another issue out there about the two different rust implementations of hash storage that I can't find at the moment, that could factor into this.

luizirber commented 3 years ago

there's another issue out there about the two different rust implementations of hash storage that I can't find at the moment, that could factor into this.

Probably https://github.com/dib-lab/sourmash/pull/1045?

But overall +1000 on this. We can unlock a lot of cool optimizations if we don't need to mutate the MinHash sketches, from reducing memory needed to store the data, to faster intersection/union calculations, and as #1045 showed, we can also optimize for change-heavy code without undoing the optimization from read-heavy code.

ctb commented 3 years ago

actually this one https://github.com/dib-lab/sourmash/issues/1055, linked from #1045.

This is another one (like #1354) that can benefit from refactoring done at the Python layer to then guide the way to improved Rust design.

ctb commented 3 years ago

this was done in #1508, which continues to be a mostly innocuous change so far! I'm going to leave this open for a bit tho, to see if there's more to be done.

ctb commented 3 years ago

closing - #1508 hasn't really caused any problems.