Adding Python Prototype for v7

fabiolimace commented 3 years ago

Added v7 Python prototype plus testing

kyzer-davis commented 3 years ago

Thanks for the pull. A quick glance with my morning coffee looks good. I will grab a copy this afternoon and run through some deeper reviews and some testing.

Appreciate the help!

fabiolimace commented 3 years ago

You can modify it or refuse it if you think it is not a compliant implementation.

There are some aspects of this implementation that can be problematic:

It does not have a clock sequence (sequence counter). I did it without clock seq to avoid more complexity.
The node id changes all the time. Should it be a static random node id for the entire session?
The "devDebugs" IF is incomplete. I didn't have time to finish.
The subsec bits are encoded by multiplying the fractional part by 2 ** subsec_bits. I assume that if the decoding is done by dividing the subsec by 2 ** subsec_bits, the encoding must be done by multiplying the fractional part by 2 ** subsec_bits. Is it required for encoding? Can the encoding be relaxed as long as the decoding derives a value "as close to the correct value as possible"?

kyzer-davis commented 3 years ago

Ah, thanks for the heads up. I will work in a clock sequence. Node can change each time that is fine. I was only doing UUID generation in these prototypes so no need to handle decoding at the moment.

kyzer-davis commented 3 years ago

@fabiolimace I updated v7 in branch uuidv7-python

Added clock sequence
More comments for those that may follow along with this code
More tests and dev debug sections

Testing I did seems okay. Let me know what you think. (Note: I did replace your splices because I am terrible with bitwise operations). I also added f4b6a3/uuid-creator to the readme table for UUIDv6

@bradleypeabody In testing this implementation I found that we may need to update draft 01. NS only needs 30 bits for subsec and our example for NS example in 4.4.4.1. UUIDv7 Encoding used too many bits. Worth also double-checking Millisecond and Microsecond too.

All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Nanosecond precision (nsec).

All 12 bits of subsec_b have been dedicated to providing sub- second encoding for the Nanosecond precision (nsec).

The first 14 bit of the subsec_seq_node dedicated to providing sub-second encoding for the Nanosecond precision (nsec).

It is an easy fix, we just need to give 8 bits back to the Random part of subsec_seq_node and update like so:

The first 6 bit of the subsec_seq_node dedicated to providing sub-second encoding for the Nanosecond precision (nsec).

Finally the remaining 48 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).

fabiolimace commented 3 years ago

@kyzer-davis Now it's much better. I also wanted to replace the bitwise operations. Thanks!

I think this patch can fix the problem of bad decoding that forced the use of padding:

--- OLD/new_uuid.py
+++ NEW/new_uuid.py
@@ -170,7 +170,7 @@
 def uuid7(devDebugs=False, returnType="hex"):
     """Generates a 128-bit version 7 UUID with nanoseconds precision timestamp and random node

-    example: 60c26bbe-0728-7f46-9602-bcf7423f3cb7
+    example: 060c4735-8bcb-7726-a200-1fd41eaa8a29

     format: unixts|subsec_a|version|subsec_b|variant|subsec_seq_node

@@ -217,8 +217,7 @@

     ### Binary Conversions
     ### Need subsec_a (12 bits), subsec_b (12-bits), and subsec_c (leftover bits starting subsec_seq_node)
-    unixts = f'{sec:032b}'
-    unixts = unixts + "0000" # Pad end with 4 zeros to get 36-bit
+    unixts = f'{sec:036b}'
     subsec_binary = f'{subsec:030b}'
     subsec_a =  subsec_binary[:12] # Upper 12
     subsec_b_c = subsec_binary[-18:] # Lower 18
@@ -263,7 +262,7 @@
     _last_uuid_int = UUIDv7_int

     # Convert Hex to Int then splice in dashes
-    UUIDv7_hex = hex(int(UUIDv7_bin, 2))[2:]
+    UUIDv7_hex = f'{UUIDv7_int:032x}'
     UUIDv7_formatted = '-'.join(
         [UUIDv7_hex[:8], UUIDv7_hex[8:12], UUIDv7_hex[12:16], UUIDv7_hex[16:20], UUIDv7_hex[20:32]])

If you want to test the UUID time you can apply these changes to testing_v6.py and testing_v7.py :

testing_v6.py

--- OLD/testing_v6.py
+++ NEW/testing_v6.py
@@ -1,5 +1,6 @@
 import new_uuid
 import random
+import time

 """
 Testing:
@@ -17,16 +18,24 @@
 showUUIDs = False # True to view the generated UUID returnType and lists
 clock_seq = None # Set Clock Sequence

+def extractSeconds(uuid):
+   uuid_hex = uuid.replace('-', '')
+   timestamp = uuid_hex[:12] + uuid_hex[13:16]
+   return int((int(timestamp, 16) - 0x01b21dd213814000) / 10000000)
+
 def v6Tests(showUUIDs):
     counter = 0
     testList = []
     masterDict = {}
+    
+    start = int(time.time())
     while counter < 1000:
         # UUIDv6 = new_uuid.uuid1(devDebugs, returnType)
         UUIDv6 = new_uuid.uuid6(devDebugs, returnType)
         testList.append(UUIDv6)
         masterDict[UUIDv6] = counter
         counter += 1
+    end = int(time.time())

     if showUUIDs:
         print("\n")
@@ -54,6 +63,9 @@
         if masterDict[UUID] != counter:
             failCount+=1
             print('{0}: {1}'.format(str(counter), UUID))
+        elif not (extractSeconds(UUID) >= start and extractSeconds(UUID) <= end):
+            failCount+=1
+            print('{0}: {1} {2}'.format(str(counter), UUID, time.ctime(extractSeconds(UUID))))
         counter+= 1
     if failCount == 0:
         print("+ No Failures Observed")

testing_v7.py

--- OLD/testing_v7.py
+++ NEW/testing_v7.py
@@ -1,5 +1,6 @@
 import new_uuid
 import random
+import time

 """
 Testing:
@@ -17,15 +18,25 @@

 showUUIDs = False # True to view the generated UUID returnType and lists

+def extractSeconds(uuid):
+   uuid_hex = uuid.replace('-', '')
+   uuid_int = int(uuid_hex, 16)
+   uuid_bin = f'{uuid_int:0128b}'
+   time_bin = uuid_bin[:36]
+   return int(time_bin, 2)
+    
 def v7Tests(showUUIDs):
     counter = 0
     testList = []
     masterDict = {}
+    
+    start = int(time.time())
     while counter < 1000:
         UUIDv7 = new_uuid.uuid7(devDebugs, returnType)
         testList.append(UUIDv7)
         masterDict[UUIDv7] = counter
         counter += 1
+    end = int(time.time())

     if showUUIDs:
         print("\n")
@@ -53,6 +64,9 @@
         if masterDict[UUID] != counter:
             failCount+=1
             print('{0}: {1}'.format(str(counter), UUID))
+        elif not (extractSeconds(UUID) >= start and extractSeconds(UUID) <= end):
+            failCount+=1
+            print('{0}: {1} {2}'.format(str(counter), UUID, time.ctime(extractSeconds(UUID))))
         counter+= 1
     if failCount == 0:
         print("+ No Failures Observed")

The file testing_v8.py don't need to test the UUID time, since it depends on the implementation.

And thank you for the inclusion of the uuid-creator!

fabiolimace commented 3 years ago

@kyzer-davis

I think we can avoid the timestamp padding doing 2 changes in the file new_uuid.py.

Change 1:

     ### Binary Conversions
     ### Need subsec_a (12 bits), subsec_b (12-bits), and subsec_c (leftover bits starting subsec_seq_node)
(-)  unixts = f'{sec:032b}'
(-)  unixts = unixts + "0000" # Pad end with 4 zeros to get 36-bit
     subsec_binary = f'{subsec:030b}'

     ### Binary Conversions
     ### Need subsec_a (12 bits), subsec_b (12-bits), and subsec_c (leftover bits starting subsec_seq_node)
(+)  nixts = f'{sec:036b}'
     subsec_binary = f'{subsec:030b}'

Change 2:

     # Convert Hex to Int then splice in dashes
(-)  UUIDv7_hex = hex(int(UUIDv7_bin, 2))[2:]
     UUIDv7_formatted = '-'.join(

     # Convert Hex to Int then splice in dashes
(+)  UUIDv7_hex = f'{UUIDv7_int:032x}' # int to hex
     UUIDv7_formatted = '-'.join(

After tthese changes the UUID is generated with the right length (36) without padding:

before: 60c26bbe-7287-f469-602b-cf7423f3cb7
after:  060c4735-8bcb-7726-a200-1fd41eaa8a29

The padding can result in different time when one tries to call uuid.get_time().

kyzer-davis commented 3 years ago

@fabiolimace

After tthese changes the UUID is generated with the right length (36) without padding:

Both methods end up padding unix 32 bit to 36. The difference is my current implementation pads the least significant bits (end) and your proposed change pads the most-significant, starting bits. (note the leading 0 in your final UUID.)
My preference has always been to pad in the least significant position and avoid leading 0s. I actually just published #21 earlier today detailing this in the V02 draft.

Change 2:

This is only required due to change number 1 causing the operation of int(UUIDv7_bin, 2) to drop the leading 0s you padded earlier. Somewhat counter-intuitive since f'{UUIDv7_int:032x} re-pads.

With the current padding, least significant position, you can use either UUIDv7_hex = hex(int(UUIDv7_bin, 2))[2:] or UUIDv7_hex = f'{UUIDv7_int:032x}' since they yield the same result of a 32 hex characters.

The padding can result in different time when one tries to call uuid.get_time()

The current implementation of uuid.get_time() will likely not be able to handle full UUIDv7 parsing until it is extended. By explicitly detailing the padding position this makes future extension of that easier. That is, if the spec is ratified as an official RFC.

With the current padding the decoder can always assume the first 32-bits of UUIDv7 are valid 32-bit Unix epoch. Decoding the remaining 4 bits along with the subsequent sub-second precision found in the rest of the UUIDv7 layout I would leave up to the implementer of the decoder.

uuid6 / prototypes

Adding Python Prototype for v7 #2