o2r-project / erc-spec

Executable Research Compendium specification and guides
https://o2r.info/erc-spec/
Creative Commons Zero v1.0 Universal
7 stars 5 forks source link

Require unbroken hash function in bag #44

Open nuest opened 6 years ago

nuest commented 6 years ago

Prefer checksums from cryptographic hash functions that have not yet been broken by collisions.

As soon as supported by bagit standard and implementations, we should go for sha3. Bagit is likely to support multiple hash functions and not require this high-quality one itself, see also https://github.com/LibraryOfCongress/bagit-python/issues/86

ghost commented 6 years ago

👍 for sha3

ps. https://en.wikipedia.org/wiki/Category:Broken_hash_functions

pps. Reasoning: Bagit validity is meant to ensure the integrity of the files in a bagit bag. Proof of validity is generated by computing checksums of the bitstreams of those files. These are represented by cryptographic hashes. When the cryptographic hash function utilized in this context is marked as broken, it is possible to manufacture a different file with the same checksum (hash value). Such "collisions" destroy the theoretical unambiguity of the hashes, hence preventing the definite identification of the file in question. Cryptographic hash functions are subject to brokenness in the course of time as computational ressources increase. Therefore, designs that aim for validation should prefer the supposed strongest hash function available at the time of their creation.

ghost commented 6 years ago

Could this fit in the developer guide? We have an empty section "Why Bagit" there. And Bagit is for checksums. This could make a small addendum on validation/security.

nuest commented 6 years ago

Yes, I don't really like putting this in the spec anymore (it's not our place to police this), but a pointer in the developer guide is a good idea.