Decoding can return negative values due to overflow

sqids / sqids-java

Official Java port of Sqids. Generate short unique IDs from numbers.

https://sqids.org/java

MIT License

198 stars 16 forks source link

Decoding can return negative values due to overflow #12

Closed robhanlon22 closed 11 months ago

robhanlon22 commented 11 months ago

Minimum repro uses default alphabet, min length, and block list:

Sqids.builder().build().decode("001011100A010") // result is [-7900952963519449329]

0x3333 commented 11 months ago

I believe that this is expected behavior.

How to check if IDs are valid?

Decoding IDs will usually produce some kind of numeric output, but that doesn't necessarily mean that the ID is canonical. To check that the ID is valid, you can re-encode decoded numbers and check that the ID matches.

The reason this is not done automatically is that if the default blocklist changes in the future, we don't want to automatically invalidate the ID that has been generated in the past and might now be matching a new blocklist word.

Source: https://sqids.org/faq#valid-ids

lukechen116 commented 11 months ago

The following operation likely resulted in an overflow, exceeding the range of a long data type. number = number * charLength + alphabet.indexOf(c); @0x3333 The current code does not use BigInteger, and an overflow should result in an ArithmeticException. Do you agree with modifying it as follows? number = Math.addExact(Math.multiplyExact(number, (long)charLength), alphabet.indexOf(c));

0x3333 commented 11 months ago

I don't believe we should change this. The problem is that the input is malformed, so this is expected.

This is the same even for the reference implementation in javascript. See: https://codesandbox.io/s/gallant-sky-g7qnrc?file=/src/index.ts

@4kimov what is your opinion on that?

4kimov commented 11 months ago

To me, it sounds like a spec issue, not a Java implementation issue ☹️

As of right now, decode() spec is designed to stop & return an empty array only when input is an empty string or there's a non-alphabet char in input.

All other scenarios are undefined -- which makes me lean towards leaving as-is, until spec is ironed out.

For now, the good news is that re-encoding would catch this.

0x3333 commented 11 months ago

@4kimov yep, sounds fair to me. Maybe add a more explicit entry in the FAQ?

@robhanlon22 will leave it as is until it is addressed in the reference implementation.