Open chris-ha458 opened 1 year ago
Some more info that I've found
sha256sum -c
or sha384sum -c
as such, adding it would necessitate further consideration into downstream uses.From these findings, I will prepare a miminum viable PR that would drop-in change sha256 with sha384 and all relevant documentation. Previous datasets would not be harmed. End users can continue to use sha256 to verify their datasets. Future users can use sha384 to verify their datasets.
An option to select hashes and a robust method to support it would be optimal, but beyond the scope of a "simple" drop in patch.
So a prepared a PR to change SHA256 into SHA384.
I have been wrong on some points though. Unless SHA256 is used as part of an HMAC construction, which it doesn't seem to be, the context of length extension attack vulnerability is invalid.
There is still value in changing into SHA384 since it could be the baseline for when HMAC might be implemented. Also, SHA384 is much faster than SHA256 due to the underlying SHA512 hash is more amenable to 64bit implementations.
1MiB input
Timing sha256sum... 10_000 iterations
real 0m23.134s
user 0m22.315s
sys 0m1.060s
Timing sha384sum...10_000 iterations
real 0m16.110s
user 0m15.372s
sys 0m1.031s
Thanks a lot for the PR!
We are (slowly) removing code that is not immediately linked to the pipeline from ungoliant, putting it in oscar-tools
, but we still have duplicates here and there. The PR should be made on oscar-tools
rather than on the main project, but IIRC the code is very similar so it shouldn't be hard to do :)
Code is here: https://github.com/oscar-project/oscar-tools/blob/main/src/ops/checksum.rs
It's very simple code and the main discussion point is whether to integrate it at all, rather than how.
Since it seems like there is interest in integrating such code, I will prepare a new PR under oscar-tools
Is your feature request related to a problem? Please describe. Per folder SHA256 hashes can be potentially vulnerable to length extension attacks.
Describe the solution you'd like
SHA-384
andSHA-512/256
exist. However the latter is difficult to canonicalize due to slash in the name (it is part of its official name). If speed or size is a concern, BLAKE3 is extremely fast and secure as well.Describe alternatives you've considered DO NOT
Additional context I consider OSCAR as an important part of the Data pipelines supply chain so I this the bar i hope OSCAR can clear. I will be willing to further investigate potential hash functions, implementations and provide PRs if necessary.
Reference : https://github.com/oscar-project/documentation/issues/13