metomi / fab

Flexible build system for scientific software
https://metomi.github.io/fab/
Other
19 stars 11 forks source link

Consider use of `hash()` function instead of `crc32` #317

Open MatthewHambley opened 1 week ago

MatthewHambley commented 1 week ago

It has been suggested that hash() is semi-cryptographic and is salted on interpreter initialisation. This would kill us as it means identical objects would return different hashes on different runs.

We should check this out to see if it's true or not.

hiker commented 1 week ago

See https://docs.python.org/3/reference/datamodel.html#object.__hash__

Note By default, the hash() values of str and bytes objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python. This is intended to provide protection against a denial-of-service caused by carefully chosen inputs that exploit the worst case performance of a dict insertion, O(n2) complexity. See http://ocert.org/advisories/ocert-2011-003.html for details.

Changing hash values affects the iteration order of sets. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).

See also PYTHONHASHSEED.

And to see it at work, hash() changes for the same string, crc32 doesn't:


$ python -c "import zlib; s='123'; print(hash(s), zlib.crc32(s.encode()))"
1233532885931708144 2286445522
$ python -c "import zlib; s='123'; print(hash(s), zlib.crc32(s.encode()))"
5988075377134479978 2286445522
$ python -c "import zlib; s='123'; print(hash(s), zlib.crc32(s.encode()))"
-6104734770311173850 2286445522
$ python -c "import zlib; s='123'; print(hash(s), zlib.crc32(s.encode()))"
2144856913462380203 2286445522