Closed maneeshsahu closed 4 years ago
$ g++ ./mmh3_reference_repl.cpp; and ./a.out
>
00000000000000000000000000000000
> I will not buy this tobacconist's, it is scratched.
d30654abbd8227e367d73523f0079673
There is a mismatch because the mmh3
library for Python is incorrect. Specifically it swaps the order of the two final uint64_t
s relative to the reference implementation.
I could not tell you why it does this, but if you can’t find a Python implementation/binding that is accurate against the reference implementation you can simply swap the 64 bits around:
>>> o = mmh3.hash128("I will not buy this tobacconist's, it is scratched.")
>>> hex(((o & 0xffffffffffffffff) << 64) + (o >> 64))
'0xd30654abbd8227e367d73523f0079673'
Taking the (unwitting) test vectors from https://github.com/aappleby/smhasher/issues/73#issuecomment-527887962:
N | Bytes | MM3-128 (x64) Reference | murmurHash3.x64.hash128
---|----------------------------------|----------------------------------|---------------------------------
1 | 00 | 4610abe56eff5cb551622daa78f83583 | 4610abe56eff5cb551622daa78f83583
2 | 0000 | 3044b81a706c5de818f96bcc37e8a35b | 3044b81a706c5de818f96bcc37e8a35b
3 | 000000 | 79d54dd1bf7137480af5e7f1b766291d | 79d54dd1bf7137480af5e7f1b766291d
4 | 00000000 | cfa0f7ddd84c76bc589623161cf526f1 | cfa0f7ddd84c76bc589623161cf526f1
5 | 0000000000 | 3df460ff3e17b53a17874fba56e69767 | 3df460ff3e17b53a17874fba56e69767
6 | 000000000000 | 7d480f9fa80ec469719af4070b74d89d | 7d480f9fa80ec469719af4070b74d89d
7 | 00000000000000 | f402c55ac5dec98f2de586f681711c02 | f402c55ac5dec98f2de586f681711c02
8 | 0000000000000000 | 28df63b7cc57c3cbf2557dfcc4e8fe52 | 28df63b7cc57c3cbf2557dfcc4e8fe52
9 | 000000000000000000 | 73269217e5476f20f1fa3fc86728ca0c | 73269217e5476f20f1fa3fc86728ca0c
10 | 00000000000000000000 | 5b3d684f8c57ce161ba63bef94931146 | 5b3d684f8c57ce161ba63bef94931146
11 | 0000000000000000000000 | 056e0d6c8921404673c2da0104c39955 | 056e0d6c8921404673c2da0104c39955
12 | 000000000000000000000000 | a4d8ece9d7c0dfe3803bbf8eb6f0853f | a4d8ece9d7c0dfe3803bbf8eb6f0853f
13 | 00000000000000000000000000 | a10ea8b22762995abb1575409cfb7dc6 | a10ea8b22762995abb1575409cfb7dc6
14 | 0000000000000000000000000000 | 028b7708fcbbed1e8393f0698afe46ea | 028b7708fcbbed1e8393f0698afe46ea
15 | 000000000000000000000000000000 | 6ce113b115a56871195953c2230f8db2 | 6ce113b115a56871195953c2230f8db2
16 | 00000000000000000000000000000000 | 4bbd1bf27da918d6b465a9eccd791cb6 | 4bbd1bf27da918d6b465a9eccd791cb6
N | Bytes | MM3-128 (x86) Reference | murmurHash3.x86.hash128
---|----------------------------------|----------------------------------|---------------------------------
1 | 00 | 88c4adec54d201b954d201b954d201b9 | 88c4adec54d201b954d201b954d201b9
2 | 0000 | 04a872bbedcd774bedcd774bedcd774b | 04a872bbedcd774bedcd774bedcd774b
3 | 000000 | e0d93642acf40e87acf40e87acf40e87 | e0d93642acf40e87acf40e87acf40e87
4 | 00000000 | cc066f1f9e5178409e5178409e517840 | cc066f1f9e5178409e5178409e517840
5 | 0000000000 | 50a68ecfd01a6609d01a6609d01a6609 | 50a68ecfd01a6609d01a6609d01a6609
6 | 000000000000 | 777fa95660bde92360bde92360bde923 | 777fa95660bde92360bde92360bde923
7 | 00000000000000 | 0d45d85efb848988fb848988fb848988 | 0d45d85efb848988fb848988fb848988
8 | 0000000000000000 | e028ae414772b0844772b0844772b084 | e028ae414772b0844772b0844772b084
9 | 000000000000000000 | 5ad58a7e543371085433710854337108 | 5ad58a7e543371085433710854337108
10 | 00000000000000000000 | 64010da262e8bc1762e8bc1762e8bc17 | 64010da262e8bc1762e8bc1762e8bc17
11 | 0000000000000000000000 | 2f35ebd169f8166569f8166569f81665 | 2f35ebd169f8166569f8166569f81665
12 | 000000000000000000000000 | 332d18d156b5986456b5986456b59864 | 332d18d156b5986456b5986456b59864
13 | 00000000000000000000000000 | 583cbe60ca53c80fca53c80fca53c80f | 583cbe60ca53c80fca53c80fca53c80f
14 | 0000000000000000000000000000 | a8e046b5855ca909855ca909855ca909 | a8e046b5855ca909855ca909855ca909
15 | 000000000000000000000000000000 | 3553d0af909796639097966390979663 | 3553d0af909796639097966390979663
16 | 00000000000000000000000000000000 | 5a4075d66b2d3d27d3926c2feb228a07 | 5a4075d66b2d3d27d3926c2feb228a07
Hi @karanlyons I am not getting the hash128 of the this library to match the python mmh3.
Python mmh3:
hex(mmh3.hash128("I will not buy this tobacconist's, it is scratched.")))
Yields: 0x67d73523f0079673d30654abbd8227e3
But in your readme:
murmurHash3.x64.hash128("I will not buy this tobacconist's, it is scratched.");
Yields: d30654abbd8227e367d73523f0079673
Why is there a mismatch?