mikeagn / smhasher

Automatically exported from code.google.com/p/smhasher
0 stars 0 forks source link

Please provided basic test vectors #6

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Would it be possible to have some basic test vectors for testing alternate 
implementations ?

Even a simple single one, like the hash of "The quick brown fox jumps over the 
lazy dog" for all the algorithm flavor would be enough.

I found a lot of wrong implementations of Murmur3 and a such simple test vector 
from the official site will help a lot the developers.

Original issue reported on code.google.com by amadva...@gmail.com on 30 Apr 2011 at 2:06

GoogleCodeExporter commented 9 years ago
With my implementation, when i hash "The quick brown fox jumps over the lazy 
dog" (using seed 0x9747b28c), i get:

0x2FA826CD

You post what *you* get, and if they match then we'll say it's ready to ship.

Original comment by josejime...@gmail.com on 28 Apr 2012 at 12:20

GoogleCodeExporter commented 9 years ago
SMHasher uses the VerificationTest function to check if a hash function is 
correctly implemented - it doesn't rely on correct capitalization or 
punctuation of "The quick brown fox..." and catches more implementation errors 
than a single test vector can. I'll make a note on the front page to point it 
out.

Original comment by tanj...@gmail.com on 11 May 2012 at 5:44

GoogleCodeExporter commented 9 years ago
While SMHasher certainly provides a more complete test suite, it's not very 
helpful in testing the implementation in another language. I have recently 
ported the hash to c# and generated a few test vectors for 3 different seeds 
(0x9747b28c, 0x0, 0xc58f1a7b) including a small collision test against english 
words.

x86_32:  http://pastebin.com/kkggV9Vx
x64_128: http://pastebin.com/k2VDbWkF

Original comment by Darcara...@googlemail.com on 26 May 2012 at 8:43

GoogleCodeExporter commented 9 years ago
So what is the correct x64_128 hash for "The quick brown fox jumps over the 
lazy dog" and zero seed?
Previous post = 6C1B07BC7BBC4BE347939AC4A93C437A

Python smhasher:
>>> hex(smhasher.murmur3_x64_128(u"The quick brown fox jumps over the lazy 
dog",0))
'0x6c1b07bc7bbc4be347939ac4a93c437aL'

Java Guava:
>>> System.out.println(Hashing.murmur3_128(0).hashString("The quick brown fox 
jumps over the lazy dog"));
4cae51b5316602c01c7c5642843e5fe7

Which version is correct?

Original comment by kamil....@gmail.com on 13 Sep 2012 at 11:20

GoogleCodeExporter commented 9 years ago
As I mentioned on the Guava bug 
(http://code.google.com/p/guava-libraries/issues/detail?id=1147),

In Guava, if you just call Hasher#hashString(String), it'll hash each character 
in order (no char encoding).

What you want is Hasher#hashString(String, Charset):

HashCode foxHash =  Hashing.murmur3_128(0).hashString(
    "The quick brown fox jumps over the lazy dog", Charsets.UTF_8);
assertEquals("6c1b07bc7bbc4be347939ac4a93c437a", foxHash.toString());
// This is the same as your Python output.

Original comment by kurt.kluever on 13 Sep 2012 at 4:51