tuupola / base62

Base62 encoder and decoder for arbitrary data
MIT License
195 stars 19 forks source link

Can't successfully convert low-value UUIDs without blowing up. #17

Closed hopeseekr closed 4 years ago

hopeseekr commented 5 years ago

The Problem

When I tried to use this with my phpexpertsinc/ConciseUUID project, the tests all failed on low value UUIDs.

InvalidArgumentException: $bytes string should contain 16 characters.

Here is a test case:

use Ramsey\Uuid\Uuid;
use Tuupola\Base62Proxy as Base62;

$uuid = Uuid::fromBytes(Base62::decode($conciseUuid));

Here's the test data:

    $badUuids = [
            '0023a441-a3a3-4d9e-bd65-de3381c3a226' => '00GHs6XflJ51yCvZ4TwH4g',
            '1ee9a026-48ef-4592-9d87-88ceea7bc35e' => '0wKXIE87UgfjIvSPLkAHao',
            '0e0aa2a8-1a10-45e4-a67a-c97b9c5a7d19' => '0QUkgNC86JAY1A8JhVZ7iT',
            '1ad0d525-97c9-4c08-ad56-59acd47e3f7c' => '0obEi3noEliUnbTQbhMrLo',
   ];

The Solution

I solved this using ext-gmp via:

    // 3. We pad zeros to the beginning, as the result returned by gmp_strval after base conversion
    // is not always 22 characters long.
    $uuid = str_pad($uuid, 22, '0', STR_PAD_LEFT);
tuupola commented 5 years ago

I took the first example 00GHs6XflJ51yCvZ4TwH4g. Ran it through the code:

$gmp = new GmpEncoder();
$original = "00GHs6XflJ51yCvZ4TwH4g";
var_dump($original);
$bytes = $gmp->decode("00GHs6XflJ51yCvZ4TwH4g");
var_dump(bin2hex($bytes));
$decoded = $gmp->encode(hex2bin("000023a441a3a34d9ebd65de3381c3a226"));
var_dump($decoded);
$bytes2 = $gmp->decode($decoded);
var_dump(bin2hex($bytes2));

Which yields:

string(22) "00GHs6XflJ51yCvZ4TwH4g"
string(34) "000023a441a3a34d9ebd65de3381c3a226"
string(22) "00GHs6XflJ51yCvZ4TwH4g"
string(34) "000023a441a3a34d9ebd65de3381c3a226"

So to me it seems the null byte problem is somewhere else. Which class throws the following exception?

InvalidArgumentException: $bytes string should contain 16 characters.
tuupola commented 5 years ago

Taking the first example again and Ramsey\Uuid\Uuid package.

$uuid = Uuid::fromString("0023a441-a3a3-4d9e-bd65-de3381c3a226");
$bytes = $uuid->getBytes();
var_dump(bin2hex($bytes));
$encoded = $gmp->encode($bytes);
var_dump($encoded);
$decoded = $gmp->decode($encoded);
var_dump(bin2hex($decoded));
$uuid2 = Uuid::fromBytes($bytes);
var_dump((string) $uuid2);

yields

string(32) "0023a441a3a34d9ebd65de3381c3a226"
string(21) "0GHs6XflJ51yCvZ4TwH4g"
string(32) "0023a441a3a34d9ebd65de3381c3a226"
string(36) "0023a441-a3a3-4d9e-bd65-de3381c3a226"

So if I am reading this correct the problem is that 00GHs6XflJ51yCvZ4TwH4g is 17 bytes when UUID expects 16. I do not know where the one extra leading 0 comes from because originally it is not there.

$uuid = Uuid::fromString("0023a441-a3a3-4d9e-bd65-de3381c3a226");
$gmp = new GmpEncoder();
$bytes = $uuid->getBytes();
$encoded = $gmp->encode($bytes);
var_dump($encoded);
string(21) "0GHs6XflJ51yCvZ4TwH4g"
tuupola commented 5 years ago

I would say the problem is this line. I can see your old code relying on gmp behaviour of dropping out leading null bytes. This class does not lose leading nulls when decoding.

$hex = gmp_strval(gmp_init("00GHs6XflJ51yCvZ4TwH4g", 62), 16);
var_dump($hex);
$hex = gmp_strval(gmp_init("0GHs6XflJ51yCvZ4TwH4g", 62), 16);
var_dump($hex);
$hex = gmp_strval(gmp_init("GHs6XflJ51yCvZ4TwH4g", 62), 16);
var_dump($hex);

$gmp = new GmpEncoder();
$bin = $gmp->decode("00GHs6XflJ51yCvZ4TwH4g");
var_dump(bin2hex($bin));
$bin = $gmp->decode("0GHs6XflJ51yCvZ4TwH4g");
var_dump(bin2hex($bin));
$bin = $gmp->decode("GHs6XflJ51yCvZ4TwH4g");
var_dump(bin2hex($bin));
string(30) "23a441a3a34d9ebd65de3381c3a226"
string(30) "23a441a3a34d9ebd65de3381c3a226"
string(30) "23a441a3a34d9ebd65de3381c3a226"

string(34) "000023a441a3a34d9ebd65de3381c3a226"
string(32) "0023a441a3a34d9ebd65de3381c3a226"
string(30) "23a441a3a34d9ebd65de3381c3a226"
tuupola commented 4 years ago

No feedback. Assuming problem was solved. Reopen if needed.