tzvetkoff / hashids.c

The C port of Hashids
MIT License
52 stars 24 forks source link

hashids_encode does not return the expected hash #1

Closed thedrow closed 8 years ago

thedrow commented 8 years ago

I have written python bindings to this library and found a bug in encoding.

My first indication was that hashids-python tests were failing:

============================ test session starts =============================
platform darwin -- Python 2.7.10[pypy-5.3.0-final], pytest-3.0.1, py-1.4.31, pluggy-0.3.1
rootdir: /Users/omer.katz/Documents/hashids, inifile:
collected 31 items

tests/test_hashids.py .............FF.............FF.

================================== FAILURES ==================================
________________________ TestEncoding.test_encode_hex ________________________

self = <test_hashids.TestEncoding object at 0x0000000106703e50>

    def test_encode_hex(self):
>       assert Hashids().encode_hex('507f1f77bcf86cd799439011') == 'y42LW46J9luq3Xq9XMly'
E       assert 'AOo9Ql5nQR1VO' == 'y42LW46J9luq3Xq9XMly'
E         - AOo9Ql5nQR1VO
E         + y42LW46J9luq3Xq9XMly

tests/test_hashids.py:81: AssertionError
_______________________ TestEncoding.test_illegal_hex ________________________

self = <test_hashids.TestEncoding object at 0x0000000106919c90>

    def test_illegal_hex(self):
        assert Hashids().encode_hex('') == ''
>       assert Hashids().encode_hex('1234SGT8') == ''
E       assert 'qyMp' == ''
E         - qyMp

tests/test_hashids.py:89: AssertionError
______________________ TestDecoding.test_only_one_valid ______________________

self = <test_hashids.TestDecoding object at 0x000000010655afe0>

    def test_only_one_valid(self):
        h = Hashids(min_length=6)
>       assert h.decode(h.encode(1)[:-1] + '0') == ()
E       assert (1L,) == ()
E         Left contains more items, first extra item: 1L
E         Use -v to get the full diff

tests/test_hashids.py:167: AssertionError
________________________ TestDecoding.test_decode_hex ________________________

self = <test_hashids.TestDecoding object at 0x00000001059f1da8>

    def test_decode_hex(self):
        hex_str = '507f1f77bcf86cd799439011'
>       assert Hashids().decode_hex('y42LW46J9luq3Xq9XMly') == hex_str
E       assert '' == '507f1f77bcf86cd799439011'
E         + 507f1f77bcf86cd799439011

tests/test_hashids.py:171: AssertionError
==================== 4 failed, 27 passed in 0.79 seconds =====================

I added a check to verify that the expected hash matches the encoded hash and there is one test failure.

........................F...

#25: hashids_encode() buffer D4h3F7i5Al does not match expected hash 6nhmFDikA0

28 samples, 1 failures

@tzvetkoff Do you mind helping me fix it?

thedrow commented 8 years ago

I just checked and it seems that the same bug occurs in hashids-python.

thedrow commented 8 years ago

The ruby implementation behaves correctly differently, producing another hash. 6mh9F9i8A9 which is similar until 6mh9F.

thedrow commented 8 years ago

Seems like you decode the hash correctly but you cannot encode it so hashids-python considers it invalid.

thedrow commented 8 years ago

The problem seems to be that number /= hashids->alphabet_length; truncates to 0 instead of 1 because in C the division is rounded down with integers. See http://stackoverflow.com/questions/3602827/what-is-the-behavior-of-integer-division-in-c

thedrow commented 8 years ago

The solution is to change the hashing loop to:

number = (unsigned long long) floor((float)number / (float)hashids->alphabet_length);

in order to match the Python behaviour. I don't know if that is the correct behaviour though.

thedrow commented 8 years ago

Now other hashes fail:

#3: hashids_encode() buffer AOo9QGAVVNYGO does not match expected hash AOo9Ql5nQR1VO
#3: hashids_decode() decoding error
#11: hashids_encode() buffer p2xkL3CK33JjcrrZ8vsw4YRZueZXnk does not match expected hash p2xkL3CK33JjcrrZ8vsw4YRZueZX9k
#11: hashids_decode() decoding error

But when I use double I get less failures:

/Users/omer.katz/Library/Caches/CLion2016.2/cmake/generated/hashids_c-a03af482/a03af482/Debug/unittest
..FF..............

#3: hashids_encode() buffer AOo9Ql5nQRmqO does not match expected hash AOo9Ql5nQR1VO
#3: hashids_decode() decoding error

17 samples, 2 failures

Process finished with exit code 1

According to this I might have stumbled on a compiler bug? http://stackoverflow.com/questions/16716957/convert-unsigned-long-long-to-double-in-c

thedrow commented 8 years ago

Nope. We need to use long double with floorl() since that's the function for flooring long doubles. I'm pushing the fix right now.

tzvetkoff commented 8 years ago

I've just merged your commits with a few more changes I did. I don't think we need to use floorl() in the divisions, since integer division suffices the needed results - and I have actually missed to add a strcmp() in the test cases back in the day :| I did a few other tweaks as well.

Keep in mind, that encoding huge values (like hex 507f1f77bcf86cd799439011) is not possible - the limit is 64 bits :-(

thedrow commented 8 years ago

I checked and we do have to use floorl(). Otherwise the algorithm fails. @tzvetkoff Can you explain why the tests now pass?