String form has 2-bits of redundancy

larsyencken commented 7 years ago

We discovered by accident in my team that the string version (26 characters of Crockford's base32) encodes 130 bits (26 * 5) rather than 128 bits. These extra two bits are correctly stripped by the implementations we've played with (Python, Julia, Golang).

See for example: https://github.com/oklog/ulid/issues/9

This means that there are four string encodings mapping to the same binary ULID, for every ULID.

This is not something you encounter generating them normally, since you will always generate them with a timestamp of right now. But in our random fuzz testing of an internal API, we encountered them.

As an implementation side note, I would recommend rejecting string ULIDs with a prefix above 7 (a ULID beginning with 8 has the same binary as one beginning with 0) with an error rather than parsing them.

alizain commented 7 years ago

@larsyencken, sorry for the delayed response.

You are correct that the string form of ULID can technically contain 130 bits of information. As in the python, julia, and golang implementations, this javascript implementation also complains when you try to encode a timestamp that would take up more than the prescribed 128 bits, and overflow.

However, it will happily decode a string that causes the timestamp to overflow. I will make the change to fix this.

I will add this explicitly in the spec.

alizain commented 7 years ago

Feel free to re-open this issue or create a PR if you'd like to improve the spec further!

larsyencken commented 7 years ago

Thanks so much for updating the spec!

ulid / javascript

String form has 2-bits of redundancy #43