Closed eleitl closed 3 years ago
I'd like to see some people try to break it first. Much of its speed comes from the assumption that current symmetric algorithms are too paranoid and have two many rounds.
btw there's a fast optimized Go-implementation available here https://github.com/zeebo/blake3
Is there any way to get a sense of priority on this? It'd be very helpful for my use-case.
Blake3 digests don't need to finish hashing a data stream before it can tell if the data is incorrect. If one single byte is off anywhere in the stream, it'll immediately throw an exception. This also allows IPLD MerkleDAG blocks to be safely made much larger.
@cryptoquick the codec is 0x1e. Feel free to add support (we can disable it by default).
Blake3 digests don't need to finish hashing a data stream before it can tell if the data is incorrect. If one single byte is off anywhere in the stream, it'll immediately throw an exception
Source? That sounds impossible of the "the hash function would be broken if that were the case" kind.
@cryptoquick the codec is 0x1e. Feel free to add support (we can disable it by default).
I'm not a Go developer, but I'm glad there's a Multicodec for it.
Source? That sounds impossible of the "the hash function would be broken if that were the case" kind.
Yeah, it really cool, check it out! https://github.com/oconnor663/bao
To get verified streaming, this would probably require changes to the IPLD implementation, also.
Blake3 digests don't need to finish hashing a data stream before it can tell if the data is incorrect. If one single byte is off anywhere in the stream, it'll immediately throw an exception
This seems like a bit of an exaggeration, but since Blake3 is a tree hash that means that it can have parallelism in computing the hash which generally means more flexibility (e.g. throwing more cores at the hash means faster detection of certain problems, verified streaming, etc.).
This also allows IPLD MerkleDAG blocks to be safely made much larger.
Absolutely, although the "safety" is related to the assumptions of the data transfer layer and the transfer layer will have to be made smart enough to take advantage of verified streaming.
As an FYI https://github.com/protocol/beyond-bitswap/pull/29 is a description of how a protocol like Bitswap might be augmented to support verified streaming along with an example of how to create a (worse than Blake3, but backwards compatible with most of the internet) verified streaming solution out of Merkle-Damgard hashes like SHA2-256.
To get verified streaming, this would probably require changes to the IPLD implementation, also.
Yeah, you can't just verify raw data while hashing. You still need the rest of the hash tree.
@aschmahmann Thanks for providing a link to that PR! I'm glad someone already mentioned verified streaming. Handling arbitrarily large block sizes poses a significant technical challenge, so simply adding Blake3 multihashes alone wouldn't solve it for free, but I think it'd be helpful for efforts like that to work towards supporting Blake3, which I guess is why I bring it up, is all.
Do you think supporting Blake3 at some point will be a helpful follow-on to a SHA2-256-only implementation?
Do you think supporting Blake3 at some point will be a helpful follow-on to a SHA2-256-only implementation?
IMO yes having support for some tree hash (e.g. Blake3 or KangarooTwelve) seems like a good idea. I think both of those functions (which AFAIK are the two most popular candidates for a tree based hash) use reduced hash rounds leading to @Stebalien's comment https://github.com/multiformats/go-multihash/issues/121#issuecomment-573771816. I'm not sure when we reach the "confidence" point of accepting Blake3 (or KangarooTwelve) by default though (@Stebalien any thoughts?).
SHA2-256 is just nice because it's already widely used and allows me to grab a SHA2-256 checksum of some random file on the internet (e.g. a Linux ISO) and have verified streaming of it over a p2p network. However, as noted in that proposal, it has some unfortunate downsides such as only being able to stream backwards which make it not ideal compared to a tree based hash.
A very good point! Another feather in SHA-2's cap is that there's relatively widespread hardware acceleration available: https://en.wikipedia.org/wiki/Intel_SHA_extensions
That said, these things take time, and Protocol Labs is no stranger to futureproofing. I think Blake3 support should be prioritized, even if the priority is quite low. :)
A very good point! Another feather in SHA-2's cap is that there's relatively widespread hardware acceleration available: https://en.wikipedia.org/wiki/Intel_SHA_extensions
That said, these things take time, and Protocol Labs is no stranger to futureproofing. I think Blake3 support should be prioritized, even if the priority is quite low. :)
To be fair here, it's great to have CPU extensions. But if the newcomer (BLAKE3 in this case) is simply downright faster then SHA2-256 (even with the use of those CPU extensions) then there is a real potential here for a just a superior hasher. Definitely faster (better is still up for debate).
For some numbers. I tried BLAKE3 (b3sum) and SHA2-256 (sha256sum) on a ODROID-XU4 (it's still 32 bit) and a Ryzen 4800u. The results surprised me a lot! 1.6GB file on the ODROID-XU4 BLAKE3: 21 seconds. SHA2-256: 45 seconds.
5.8GB file on the Rezen 4800u BLAKE3: 2.2 seconds SHA2-256: 3.7 seconds
Note that subsequent BLAKE3 (b3sum) runs somehow were much faster with a mere ~0.5 seconds. I'm suspecting some caching going on there. So i took the first and slowest timing.
That's a full on win for BLAKE3 in two wildly different test scenarios, even though SHA2-256 has the benefit of dedicated CPU extensions.
Is the only thing missing to close this issue a verdict on 'include this for real'. If thats the case, I think the world has generally agreed that blake3 is a good thing, and i've seen no indication from any cryptographer that theres anything wrong with it.
cc Chief Blake3 FanBoy @zookozcash
In my experience a cryptographer will never tell you "There's nothing wrong with it." or "It's safe." or anything like that. The best you can do is say something like "Can you point out a problem with it." and they won't. This is basically the exact same process you have to use with good lawyers. "Never ask a lawyer if doing a thing is safe, for the answer is always no. Instead ask, what's the safest way to do the thing, and what risks would remain."
Anyway, yeah, as far as I know there has been no cryptographic result indicating any weakness in BLAKE3. And since BLAKE3 is based on BLAKE2 (2012), which is based on BLAKE (2008) (the most thoroughly studied algorithm from the SHA3 competition and one of two algorithms with the biggest security margin), which is based on ChaCha20 (2008), which is based on Salsa20 (2005) (the winner of the eSTREAM competition), then it definitely has a pedigree.
Of course that doesn't mean that the changes didn't accidentally insert a weakness! I would never say it is safe. ;-) But I can't point to any flaws in it…
From twitter:
@markg85
Note that subsequent BLAKE3 (b3sum) runs somehow were much faster with a mere ~0.5 seconds. I'm suspecting some caching going on there. So i took the first and slowest timing.
Sha256 was CPU-bound, so the IO speed had little-to-no effect on sha256 hashing, while blake3 was IO-bound, so blake3 ran much faster when the file was already in the IO-cache
Sha256 was CPU-bound, so the IO speed had little-to-no effect on sha256 hashing, while blake3 was IO-bound, so blake3 ran much faster when the file was already in the IO-cache
Ahh, so that's what's happening! I was assuming it to be something like that but didn't bother figuring it out. Thank you for confirming!
The mere fact that this happens does imho mean that BLAKE3 is just superior for today's hardware.
Reviving this to keep all info in one place. I do have go-ipfs 0.11 which has this has supported.
ipfs add --nocopy --hash blake3 <somefile>
Fails with:
Error: potentially insecure hash functions not allowed
Is that intentional?
Is that intentional?
no that must be a bug, but probably not a bug in go-multihash..
After some searching it looks like this is causing it: https://github.com/ipfs/go-verifcid/blob/master/validate.go BLAKE3 isn't in the list.
It does have if code >= mh.BLAKE2B_MIN+19 && code <= mh.BLAKE2B_MAX
which i don't fully get the logic off.. It "seems" like an attempt to allow blake hashes too? not entirely sure.
The repository (go-verifycid) states that it's a temporary one, but searching for it's use in go-ipfs turns out that it's well used: https://github.com/search?q=org%3Aipfs+go-verifcid&type=code
@markg85 yeah i'd recommend filing a bugreport in the go-verifcid repo
BLAKE3 https://github.com/BLAKE3-team/BLAKE3 is a major improvement (https://medium.com/asecuritysite-when-bob-met-alice/blake3-3716708235ac https://news.ycombinator.com/item?id=22021769 https://news.ycombinator.com/item?id=22003315) on BLAKE2 in regards to speed.
There's a pure Go implementation https://github.com/lukechampine/blake3 though it's not optimized yet. An optimized version will become available in x/crypto at some point.
As an enhancement proposal, it would be nice to have at least preliminary BLAKE3 support in multihash so that downstream like IPFS can start using it.