Open nuest opened 6 years ago
👍 for sha3
ps. https://en.wikipedia.org/wiki/Category:Broken_hash_functions
pps. Reasoning: Bagit validity is meant to ensure the integrity of the files in a bagit bag. Proof of validity is generated by computing checksums of the bitstreams of those files. These are represented by cryptographic hashes. When the cryptographic hash function utilized in this context is marked as broken, it is possible to manufacture a different file with the same checksum (hash value). Such "collisions" destroy the theoretical unambiguity of the hashes, hence preventing the definite identification of the file in question. Cryptographic hash functions are subject to brokenness in the course of time as computational ressources increase. Therefore, designs that aim for validation should prefer the supposed strongest hash function available at the time of their creation.
Could this fit in the developer guide? We have an empty section "Why Bagit" there. And Bagit is for checksums. This could make a small addendum on validation/security.
Yes, I don't really like putting this in the spec anymore (it's not our place to police this), but a pointer in the developer guide is a good idea.
Prefer checksums from cryptographic hash functions that have not yet been broken by collisions.
As soon as supported by bagit standard and implementations, we should go for
sha3
. Bagit is likely to support multiple hash functions and not require this high-quality one itself, see also https://github.com/LibraryOfCongress/bagit-python/issues/86