oittaa / uuid6-python

New time-based UUID formats which are suited for use as a database key
MIT License
126 stars 10 forks source link

[Feature request] Ability to define the timestamp part for generation functions #150

Open attevaltojarvi opened 6 months ago

attevaltojarvi commented 6 months ago

Hi, and thanks for this package!

I'm proposing an update for the uuid6, uuid7 and uuid8 functions, where you could optionally specify the timestamp that gets used when generating the UUID value. For example for the uuid7 function:

def uuid7(timestamp_ms: int = None) -> UUID:
    global _last_v7_timestamp

    if timestamp_ms is None:
        nanoseconds = time.time_ns()
        timestamp_ms = nanoseconds // 10**6
    # (rest of function)

I haven't checked whether this isn't allowed in the spec, but I feel that this would be really useful in situations where you need to generate UUIDs for historical data, where you have the records' creation timestamp available:

# Django model example

for obj in Model.objects.iterator():
    timestamp = calendar.timegm(obj.created_at.utctimetuple())
    timestamp_ms = timestamp * 10**3
    obj.new_id = uuid7(timestamp_ms)
    obj.save()

This would allow for updating a system to start creating new records with the current timestamp, and a data migration for historical data, retaining the sortability by the UUID timestamp part.

Thanks in advance!

oittaa commented 3 months ago

Sorry I hadn't checked GitHub in a moment. While these options sound like a nice idea, I'm a bit worried that people would misuse these functions. v6 has a weird offset by Microsoft, v7 uses milliseconds since epoch, v8 nanoseconds... Does anyone have suggestions how to reasonably avoid disasters like mixing nanoseconds and milliseconds?

attevaltojarvi commented 2 months ago

I personally think that the function signatures should just be clearly defined on which type they expect to receive:

def uuid7(at_milliseconds: int = None) -> UUID:
    ...

def uuid8(at_nanoseconds: int = None) -> UUID:
    ...

Getting the order of magnitude wrong is just a bad user error you can make with any other 3rd party library.

mfresonke-work commented 1 week ago

Does anyone have suggestions how to reasonably avoid disasters like mixing nanoseconds and milliseconds?

@oittaa This is a fair argument. While I do agree with @attevaltojarvi that there's only so much you can do, I have found using parameter args in JS a good counter to this, as it's self documenting. Seems like you can do something similar in python?

def uuid7(*, unix_ms: int = None):
   # generation code goes here

def uuid8(*, unix_nanos: int = None):
   # generation code goes here

That would at least require the caller to blatantly ignore the fact it says _ms or _nanos when calling it with incorrect data.

uuid7(unix_ms=1727920979122)
uuid7(unix_nanos=1727921429461971000)

Additionally, you could also do a sanity check on the value , but I understand that is not a perfect solution.

attevaltojarvi commented 1 week ago

Explicitly specifying keyword arguments is definitely a good idea. unix_ts_millis and unix_ts_nanos could be good names for them. :+1: