multiformats / go-multihash

Multihash implementation in Go
MIT License
239 stars 57 forks source link

What's going on with murmur3? #135

Open warpfork opened 3 years ago

warpfork commented 3 years ago

This code looks an awful lot like it's doing a murmur3-32, and returning four bytes:

https://github.com/multiformats/go-multihash/blob/6f1ea18f1da5f7735ea31b5e2011da61c409e37f/sum.go#L146-L154

And this code looks an awful lot like it's registering the above code as murmur3-128, which I suspect ought to return more than four bytes:

https://github.com/multiformats/go-multihash/blob/6f1ea18f1da5f7735ea31b5e2011da61c409e37f/sum.go#L202

Should I be worried?

And if this is a bug, how comfortable are we about fixing it vs potentially breaking things that rely on the current behavior?

warpfork commented 3 years ago

I guess https://github.com/multiformats/go-multihash/commit/1b82edf96afa4 is relevant, but I'm still sort of confused. Something with "128" in the name returning 4 bytes is... unexpected.

Stebalien commented 3 years ago

Yeah, this seems wrong. I'm not sure what's going on here.

Stebalien commented 3 years ago

I believe we should be using Sum64 (which is also not 128 but it's what go-unixfs uses). Honestly, we should never use this hash function for anything.

aschmahmann commented 3 years ago

I did a bit of poking around and here's what I've got.

murmur3

UnixFS Sharding

Plan I think we should go with:

Note: I'll put up some code on the endianness + test vectors when I get a chance. Need another pair of eyes to make sure I'm doing it correctly, and also need some context on UnixFS usage to make sure I'm understanding how the murmur3 hashes are being used.