Closed ctb closed 2 months ago
Attention: Patch coverage is 70.83333%
with 7 lines
in your changes missing coverage. Please review.
Project coverage is 86.49%. Comparing base (
26b50f3
) to head (aa109f8
). Report is 1 commits behind head on latest.
Files with missing lines | Patch % | Lines |
---|---|---|
src/core/src/manifest.rs | 72.72% | 6 Missing :warning: |
src/core/src/collection.rs | 50.00% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@luizirber @bluegenes I've got this down to something that's pleasingly simple and I'm curious what you think.
The underlying requirement is that when loading a standalone manifest, we want to intersect the new manifest with the old so as to allow subsetting of large collections - this is one of the key behaviors needed for branchwater.
In this PR, I've implemented an intersect_manifest
method on Collection
and Manifest
that provides this behavior, and it seems to be working as intended over in https://github.com/sourmash-bio/sourmash_plugin_branchwater/pull/430 🎉 .
The main remaining question is, how do we decide if two rows are equal? For now, I am using a (name, md5) tuple to do this, which is how sourmash does it (but not for good reasons, so we could change it in the Rust code). We could switch to providing Hash
and Eq
traits, which would make HashSet
work directly on Record
s and make everything simpler.
However, right now the Eq
derivation on struct Record
compares all fields for equality link. I would like to remove internal_location
from this comparison to support standalone manifests better; I think this is all that's needed.
Thoughts? Concerns?
@luizirber @bluegenes bumping previous comment - the branchwater code is coming along nicely, but it won't really be mergeable until we figure out the sourmash side of things. Thanks!
The main remaining question is, how do we decide if two rows are equal? For now, I am using a (name, md5) tuple to do this, which is how sourmash does it (but not for good reasons, so we could change it in the Rust code). We could switch to providing
Hash
andEq
traits, which would makeHashSet
work directly onRecord
s and make everything simpler.However, right now the
Eq
derivation onstruct Record
compares all fields for equality link. I would like to removeinternal_location
from this comparison to support standalone manifests better; I think this is all that's needed.
I agree, Hash + Eq/Partial_Eq would be great here - better/more standardized to automatically check all the relevant info rather than just checking to name, md5sum.
@luizirber @bluegenes ready for review!
@luizirber any strong objections to the code here?
@luizirber @bluegenes can we merge this? :)
(not sure why codecov is acting up... tests do cover that code?!)
I'm ok with merging 🤷🏼♀️
(not sure why codecov is acting up... tests do cover that code?!)
Sometimes cargo-tarpaulin doesn't get everything, I've tried #2993 some time ago, maybe time to revisit
🎉
This PR implements
Manifest::intersect_manifest
andCollection::intersect_manifest
for the Rust layer, which is needed to support standalone manifests over in https://github.com/sourmash-bio/sourmash_plugin_branchwater/pull/430.As part of this, the PR implements
Eq
andHash
traits forRecord
so thatHashSet
can be used for efficient intersections.Related PRs: