mdomke / python-ulid

ULID implementation for Python
https://python-ulid.rtfd.io
MIT License
412 stars 19 forks source link

question: Is there a way to validate whether a 26 char string conforms to a valid ULID? #26

Closed somnam closed 5 months ago

somnam commented 6 months ago

To verify whether a received string conforms to a valid ULID, I use the ULID.from_str method:

    try:
        return ULID.from_str(value)
    except ValueError:
        return None

Occasionally, a string of 26 characters may not be a valid ULID but may match the ULID length. I initially assumed such values would trigger a ValueError. However, it seems that any string consisting of 26 ASCII chars is accepted by from_str, resulting in ULID instances created from random data, i.e.:

>>> ULID.from_str('notavalidulidnotavalidulid')
ULID(7ZTAVFZZFZZZDQZTAVFZZFZZZD)

>>> ULID.from_str('DHCKWI*$HF$%MCNSKDW(#@_SCK')
ULID(5HCKZZZZHFZZMCNSKDZZZZZSCK)

Is this the intended behavior?

Additionally in some cases, the year part of the timestamp extracted from such values surpasses the maximum allowed by the datetime class, leading to a year is out of range error when inspecting the datetime:

>>> ULID.from_str('notavalidulidnotavalidulid').datetime
ValueError: year 10883 is out of range

>>> ULID.from_str("".join(random.choices(string.ascii_letters, k=26))).datetime
ValueError: year 10889 is out of range

>>> ULID.from_str('Z' * 26).datetime
ValueError: year 10889 is out of range

Perhaps the decoded timestamp part could be validated whether it exceeds the maximum value accepted by datetime.datetime?

mdomke commented 6 months ago

Hi @somnam, I will look into this and see how we can improve the validation here. Thank you for raising this issue.

mdomke commented 5 months ago

I added stricter validation when creating ULIDs from user input in release 2.6.0.

somnam commented 4 months ago

@mdomke Thank you for addressing the issue.

After updating the library to 2.6.0, I noticed a problem:

ul = ULID.from_uuid(UUID('f3c9261d-a7b7-4df5-86cd-0bb65f57c252')); ULID.from_str(str(ul)) ValueError: year 10464 is out of range ValueError: ULID timestamp is out of range.

ul = ULID.from_uuid(UUID('948ddc46-31a4-4733-9ef0-fec8e1b4ef69')); ULID.from_str(str(ul)) ULID(4MHQE4CCD48WSSXW7YS3GV9VV9)

ul = ULID.from_uuid(UUID('ee6fcae3-baa8-4f34-96e7-6e486b0d7ddc')); ULID.from_str(str(ul)) ValueError: year 10277 is out of range ValueError: ULID timestamp is out of range.

ULID.from_hex("eddf81fec79667690f22de49efacb34f") ValueError: year 10258 is out of range ValueError: ULID timestamp is out of range.

ULID.from_hex("446896f419c5fbc9d492a60fcf25d744") ULID(24D2BF86E5ZF4X94N61Z7JBNT4)

ULID.from_hex("f09152e4bc21063f71d7eaee5e7f24de") ValueError: year 10351 is out of range ValueError: ULID timestamp is out of range.

It seems that timestamp validation based on the range of Python's datetime object does not always conform to the ULID spec. According to the ULID spec:

the largest valid ULID encoded in Base32 is 7ZZZZZZZZZZZZZZZZZZZZZZZZZ, which corresponds to an epoch time of 281474976710655 or 2 ^ 48 - 1

The datetime validation appears to reduce the accepted timestamp portion range. Both UUIDs and 32-bit hex strings can exceed this reduced limit, leading to problems in generating or deserialising ULID instances.

Tweaking the timestamp validation to check for the upper limit mentioned in the spec file could solve these problems and ensure better compliance. I've created a PR with a proposed update to the validation. Could you please have a look?