iCyberon / pg_hashids

Short unique id generator for PostgreSQL, using hashids
MIT License
280 stars 24 forks source link

Inconsistent results when calling id_encode multiple times in same query #19

Closed emrul closed 5 years ago

emrul commented 5 years ago

Hi,

The following SQL does not work as expected

SELECT id_encode(table_a.contractid, 'my salt')   AS a_encoded,
       table_a.contractid as a_unencoded,
       id_encode(table_b.id, 'my salt') AS b_encoded,
       table_b.id AS b_unencoded
FROM table_a, table_b

and returns: a_encoded LJqaROLN8GPb a_unencoded 1989354300773498885 b_encoded LJqaROLN8GPbe b_unencoded 1989354300773498885

both input values (big ints) are the same but notice that one result has an additional character at the end. And yes, I think the first result (a_encoded) is perhaps a null character?

The problem seems worse if I don't use a salt.

The problem does not arise if I only have one id_encode call in a query.

I'm currently using Postgres 11.1 and have just upgraded to the latest GH version of pg_hashids to confirm the problem.

rotten commented 5 years ago

I see this as well, at least using the psql command line:

select
    id_encode(12345, 'mysalt', 7),
    id_encode(12345, 'mysalt', 7)
;

 id_encode | id_encode 
-----------+-----------
 n4aL\x08  | n4aLVMg
(1 row)

I'm using PostgreSQL 11.2. It always seems to be when I have the id_encode in the first column of my result set. I have observed it happening even when it there is only one id_encode() in the query, although I haven't come up with a good example for that yet.

rotten commented 5 years ago

Just to confirm it wasn't a psql issue specifically, I connected with Omnidb and ran the above query again.

rotten commented 5 years ago

And when I repeat the column 5 times, instead of just twice, all of them are corrupted except the last one.

iCyberon commented 5 years ago

Just FYI, I started debugging the issue, will post updates here.

NelsonFrancisco commented 5 years ago

Any updates? I'm having the same issue.

Psql 10.4 Hashid version 1.2.1

NelsonFrancisco commented 5 years ago

Will this issue be addressed? I think it is a critical bug. @iCyberon

JaredReisinger commented 5 years ago

I'm seeing this as well (in PostgreSQL 12-beta3 on alpine), with an encoded values of "W6Kp\u0010", "23oZ\c0010", "aM9n\u0010", and "1vdz\u0010". Notably, it's always the last character, which is always \u0010. (But @rotten, above, was seeing \u0008, so maybe the other arguments to id_encode matter.) Also, these are not only the first columns, but also the first rows in their respective tables.

I think the title of the issue is misleading: it's not "multiple times in same query" as much as "an id_encode() value in the first column will sometimes have the wrong final character", which to me feels like something is overwriting the end of the string in some cases. (Although again, @rotten was getting the bogus values in "all but the last one".) I'm taking a look at the code, but I don't know anything about PostgreSQL internals, so I'm unlikely to find anything quickly.

Update: I've seen the same thing as @rotten, where calling id_encode() multiple times (consecutively, in my case) results in all-but-the-last being corrupted. My gut says this means that subsequent calls are stomping on the tail-end of previous ones, but it'll take more investigation.

JaredReisinger commented 5 years ago

Some more diagnostics... running manual select commands via pgAdmin that look like this:

select
    id_encode(0, 'salt', 5),
    id_encode(1, 'salt', 5),
    id_encode(2, 'salt', 5),
    id_encode(3, 'salt', 5),
    id_encode(4, 'salt', 5),
    id_encode(5, 'salt', 5);

I've found that the value and salt don't seem to matter at all, but the length (third argument) definitely does. The corrupted strings repro very reliably in all-but-the-last columns with lengths 5-8, 13-16, and 29-32. Lengths in the gaps (9-12, 17-28) seem to work without fail. That the problem appears in the 4 lengths leading up to a power of 2 seems significant. Sure enough, lengths of 60 and 65 are fine, but 61-64 all show the same problem.

I don't know whether this is a flaw in the hashids algorithm itself, or in the PostgreSQL wrapper. (Odds are the latter, as the core hashids implementation is used elsewhere.) Fingers crossed, I'm going to try digging into the code.

SafeChain-Ops commented 5 years ago

Just as a note - I can confirm that the fix resolved the issues I was having. Thank You!!