nipype / pydra

Pydra Dataflow Engine
https://nipype.github.io/pydra/
Other
119 stars 57 forks source link

ENH: Add generic object hasher #644

Closed effigies closed 1 year ago

effigies commented 1 year ago

Types of changes

Summary

This is an initial proposal for how to generically hash an object. The basic scheme is to create a single hash object that accepts byte-strings, and then iterates over bytes representations of components of the object to be hashed.

The job of each object is then not to tell how to hash itself but to provide a stable representation of itself in bytes that is unique. By creating generators, it is not necessary to keep the full bytes representation of the object in memory at any time, which will be critical if file contents are to be hashed.

I propose to use blake2b because it achieves MD5-like speed and we can use custom digest sizes to keep digests readable. Blake2 also has a tree mode, where multiple hashes can be combined. This may be useful for computing an interface hash that is created from the hashes of all inputs, but I have not attempted to think through this yet.

There is a pathological case for repeated objects:

In [101]: s = {1,2,3}

In [103]: t = {4, 5, 6}

In [104]: hash_object({'a': s, 'b': t, 'c': s})
Out[104]: b'G\x8a\xa0:\x0e\xd4\xc4>2\xc9F\xedz\x9ax\x1b'

In [105]: hash_object({'a': s, 'b': t, 'c': t})
Out[105]: b'G\x8a\xa0:\x0e\xd4\xc4>2\xc9F\xedz\x9ax\x1b'

This would be mitigated by mapping objects directly to hashes.

Closes #626.

Checklist

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 87.41% and project coverage change: -0.63 :warning:

Comparison is base (426564e) 81.77% compared to head (c9926ef) 81.14%.

:exclamation: Current head c9926ef differs from pull request most recent head cb01648. Consider uploading reports for the commit cb01648 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #644 +/- ## ========================================== - Coverage 81.77% 81.14% -0.63% ========================================== Files 20 21 +1 Lines 4400 4535 +135 Branches 1264 0 -1264 ========================================== + Hits 3598 3680 +82 - Misses 798 855 +57 + Partials 4 0 -4 ``` | Flag | Coverage Δ | | |---|---|---| | unittests | `81.14% <87.41%> (-0.63%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipype#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://app.codecov.io/gh/nipype/pydra/pull/644?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipype) | Coverage Δ | | |---|---|---| | [pydra/utils/hash.py](https://app.codecov.io/gh/nipype/pydra/pull/644?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipype#diff-cHlkcmEvdXRpbHMvaGFzaC5weQ==) | `85.60% <85.60%> (ø)` | | | [pydra/engine/specs.py](https://app.codecov.io/gh/nipype/pydra/pull/644?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipype#diff-cHlkcmEvZW5naW5lL3NwZWNzLnB5) | `94.72% <92.30%> (-0.08%)` | :arrow_down: | | [pydra/engine/helpers.py](https://app.codecov.io/gh/nipype/pydra/pull/644?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipype#diff-cHlkcmEvZW5naW5lL2hlbHBlcnMucHk=) | `84.39% <100.00%> (-1.66%)` | :arrow_down: | | [pydra/engine/helpers\_file.py](https://app.codecov.io/gh/nipype/pydra/pull/644?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipype#diff-cHlkcmEvZW5naW5lL2hlbHBlcnNfZmlsZS5weQ==) | `85.97% <100.00%> (-0.13%)` | :arrow_down: | ... and [4 files with indirect coverage changes](https://app.codecov.io/gh/nipype/pydra/pull/644/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nipype)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

effigies commented 1 year ago

Subsumed by #662.