tuupola / base62

Base62 encoder and decoder for arbitrary data
MIT License
194 stars 19 forks source link

Leading 0x00 stripped from binary data #4

Closed ElfSundae closed 6 years ago

ElfSundae commented 6 years ago

I met a leading \0 string when running tests, is this normal?

There was 1 failure:

1) Tuupola\Base62\Base62Test::testShouldEncodeAndDecodeRandomBytes
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-Binary String: 0x00486eea2de87439fc081b892616a3b0f1f098df86e2cdd23e7d21f5f046a30a1a6662fff6c3c017b1d4853a1fdd7dc00975016d9c2801b9df659fadc6abe1109b1e1f3960367603e75bb9ddf9d8097af5948f74df585d05bbee61aff992f3d35577e31aafce7d4342d3a68da0d5ca8d46bde2f7e7f555cf6a1938c4f52bdd43
+Binary String: 0x486eea2de87439fc081b892616a3b0f1f098df86e2cdd23e7d21f5f046a30a1a6662fff6c3c017b1d4853a1fdd7dc00975016d9c2801b9df659fadc6abe1109b1e1f3960367603e75bb9ddf9d8097af5948f74df585d05bbee61aff992f3d35577e31aafce7d4342d3a68da0d5ca8d46bde2f7e7f555cf6a1938c4f52bdd43

/private/tmp/base62/tests/Base62Test.php:41

FAILURES!
ElfSundae commented 6 years ago

base64_encode can handle this:

>>> $data = hex2bin("00313233");
=> "\0123"
>>> base64_encode($data)
=> "ADEyMw=="
>>> base64_decode("ADEyMw==")
=> "\0123"
>>> Tuupola\Base62Proxy::encode($data)
=> "DWjr"
>>> Tuupola\Base62Proxy::decode("DWjr")
=> "123"
tuupola commented 6 years ago

This seems to be default gmp behaviour. Since it is an numerical conversion leading zeroes do not have a value.

print $encoded = gmp_strval(gmp_init("0x00deadbeef", 16), 62);
print $decoded = gmp_strval(gmp_init($encoded, 62), 16);
/*
44pZgF
deadbeef
*/
ElfSundae commented 6 years ago

PhpEncoder can not handle either.

tuupola commented 6 years ago

All encoders have the same output since they are interchangeable.

ElfSundae commented 6 years ago

Then just forget it 😆

tuupola commented 6 years ago

I keep this open for a while to think about it. For numerical conversions losing leading zeros is ok. For binary data losing leading 0x00 is kind of not ok because that is something one would not expect.

ElfSundae commented 6 years ago

Yes, that is why I opened this issue. I have no idea how to fix it, and I have thought about it. If we clearly know the length of the original data, such as UUID, we can pad the leading, so maybe we can save the length in the encoded data. But we have to give a big enough room for the length, this will cause the encoded data too long.

ElfSundae commented 6 years ago

@tuupola I'd like to tell you I released a new package yesterday that ships with your Base62 package, thanks for your great job! 👍

https://github.com/ElfSundae/laravel-hashid

tuupola commented 6 years ago

Looks good!

mmeyer2k commented 6 years ago

Hey everyone, I found a hack-ish way to solve this problem inside of your project, in case any of you are still wanting a solution. By always pre-pending a non-null byte to the data, then discarding that byte upon decoding, you can maintain arbitrary data integrity.

use Tuupola\Base62Proxy as Base62;

$encoded = Base62::encode("\x01" . $somedata);
$decoded = Base62::decode(substr($encoded, 1));

Perhaps some kind of special arbitrary data mode functions like encodeBinary() and decodeBinary() could be added to the project for this scenario.

1ma commented 6 years ago

I keep this open for a while to think about it. For numerical conversions losing leading zeros is ok. For binary data losing leading 0x00 is kind of not ok because that is something one would not expect.

FWIW I see it the same way, because decoding an encoded payload should always yield the exact original data. Hence losing leading zeroes on integers is fine, but on strings it is not.

tuupola commented 6 years ago

Fixed in 0.11.0.