Closed fabiolimace closed 3 years ago
Thanks for the pull. A quick glance with my morning coffee looks good. I will grab a copy this afternoon and run through some deeper reviews and some testing.
Appreciate the help!
You can modify it or refuse it if you think it is not a compliant implementation.
There are some aspects of this implementation that can be problematic:
subsec
bits are encoded by multiplying the fractional part by 2 ** subsec_bits
. I assume that if the decoding is done by dividing the subsec
by 2 ** subsec_bits
, the encoding must be done by multiplying the fractional part by 2 ** subsec_bits
. Is it required for encoding? Can the encoding be relaxed as long as the decoding derives a value "as close to the correct value as possible"?Ah, thanks for the heads up. I will work in a clock sequence. Node can change each time that is fine. I was only doing UUID generation in these prototypes so no need to handle decoding at the moment.
@fabiolimace I updated v7 in branch uuidv7-python
Testing I did seems okay. Let me know what you think. (Note: I did replace your splices because I am terrible with bitwise operations). I also added f4b6a3/uuid-creator to the readme table for UUIDv6
@bradleypeabody In testing this implementation I found that we may need to update draft 01. NS only needs 30 bits for subsec and our example for NS example in 4.4.4.1. UUIDv7 Encoding used too many bits. Worth also double-checking Millisecond and Microsecond too.
- All 12 bits of scenario subsec_a is fully dedicated to providing sub-second encoding for the Nanosecond precision (nsec).
- All 12 bits of subsec_b have been dedicated to providing sub- second encoding for the Nanosecond precision (nsec).
- The first 14 bit of the subsec_seq_node dedicated to providing sub-second encoding for the Nanosecond precision (nsec).
It is an easy fix, we just need to give 8 bits back to the Random part of subsec_seq_node
and update like so:
- The first 6 bit of the subsec_seq_node dedicated to providing sub-second encoding for the Nanosecond precision (nsec).
- Finally the remaining 48 bits in the subsec_seq_node section are layout is filled out with random data to pad the length and provide guaranteed uniqueness (rand).
@kyzer-davis Now it's much better. I also wanted to replace the bitwise operations. Thanks!
I think this patch can fix the problem of bad decoding that forced the use of padding:
--- OLD/new_uuid.py
+++ NEW/new_uuid.py
@@ -170,7 +170,7 @@
def uuid7(devDebugs=False, returnType="hex"):
"""Generates a 128-bit version 7 UUID with nanoseconds precision timestamp and random node
- example: 60c26bbe-0728-7f46-9602-bcf7423f3cb7
+ example: 060c4735-8bcb-7726-a200-1fd41eaa8a29
format: unixts|subsec_a|version|subsec_b|variant|subsec_seq_node
@@ -217,8 +217,7 @@
### Binary Conversions
### Need subsec_a (12 bits), subsec_b (12-bits), and subsec_c (leftover bits starting subsec_seq_node)
- unixts = f'{sec:032b}'
- unixts = unixts + "0000" # Pad end with 4 zeros to get 36-bit
+ unixts = f'{sec:036b}'
subsec_binary = f'{subsec:030b}'
subsec_a = subsec_binary[:12] # Upper 12
subsec_b_c = subsec_binary[-18:] # Lower 18
@@ -263,7 +262,7 @@
_last_uuid_int = UUIDv7_int
# Convert Hex to Int then splice in dashes
- UUIDv7_hex = hex(int(UUIDv7_bin, 2))[2:]
+ UUIDv7_hex = f'{UUIDv7_int:032x}'
UUIDv7_formatted = '-'.join(
[UUIDv7_hex[:8], UUIDv7_hex[8:12], UUIDv7_hex[12:16], UUIDv7_hex[16:20], UUIDv7_hex[20:32]])
If you want to test the UUID time you can apply these changes to testing_v6.py
and testing_v7.py
:
testing_v6.py
--- OLD/testing_v6.py
+++ NEW/testing_v6.py
@@ -1,5 +1,6 @@
import new_uuid
import random
+import time
"""
Testing:
@@ -17,16 +18,24 @@
showUUIDs = False # True to view the generated UUID returnType and lists
clock_seq = None # Set Clock Sequence
+def extractSeconds(uuid):
+ uuid_hex = uuid.replace('-', '')
+ timestamp = uuid_hex[:12] + uuid_hex[13:16]
+ return int((int(timestamp, 16) - 0x01b21dd213814000) / 10000000)
+
def v6Tests(showUUIDs):
counter = 0
testList = []
masterDict = {}
+
+ start = int(time.time())
while counter < 1000:
# UUIDv6 = new_uuid.uuid1(devDebugs, returnType)
UUIDv6 = new_uuid.uuid6(devDebugs, returnType)
testList.append(UUIDv6)
masterDict[UUIDv6] = counter
counter += 1
+ end = int(time.time())
if showUUIDs:
print("\n")
@@ -54,6 +63,9 @@
if masterDict[UUID] != counter:
failCount+=1
print('{0}: {1}'.format(str(counter), UUID))
+ elif not (extractSeconds(UUID) >= start and extractSeconds(UUID) <= end):
+ failCount+=1
+ print('{0}: {1} {2}'.format(str(counter), UUID, time.ctime(extractSeconds(UUID))))
counter+= 1
if failCount == 0:
print("+ No Failures Observed")
testing_v7.py
--- OLD/testing_v7.py
+++ NEW/testing_v7.py
@@ -1,5 +1,6 @@
import new_uuid
import random
+import time
"""
Testing:
@@ -17,15 +18,25 @@
showUUIDs = False # True to view the generated UUID returnType and lists
+def extractSeconds(uuid):
+ uuid_hex = uuid.replace('-', '')
+ uuid_int = int(uuid_hex, 16)
+ uuid_bin = f'{uuid_int:0128b}'
+ time_bin = uuid_bin[:36]
+ return int(time_bin, 2)
+
def v7Tests(showUUIDs):
counter = 0
testList = []
masterDict = {}
+
+ start = int(time.time())
while counter < 1000:
UUIDv7 = new_uuid.uuid7(devDebugs, returnType)
testList.append(UUIDv7)
masterDict[UUIDv7] = counter
counter += 1
+ end = int(time.time())
if showUUIDs:
print("\n")
@@ -53,6 +64,9 @@
if masterDict[UUID] != counter:
failCount+=1
print('{0}: {1}'.format(str(counter), UUID))
+ elif not (extractSeconds(UUID) >= start and extractSeconds(UUID) <= end):
+ failCount+=1
+ print('{0}: {1} {2}'.format(str(counter), UUID, time.ctime(extractSeconds(UUID))))
counter+= 1
if failCount == 0:
print("+ No Failures Observed")
The file testing_v8.py
don't need to test the UUID time, since it depends on the implementation.
And thank you for the inclusion of the uuid-creator!
@kyzer-davis
I think we can avoid the timestamp padding doing 2 changes in the file new_uuid.py
.
Change 1:
### Binary Conversions
### Need subsec_a (12 bits), subsec_b (12-bits), and subsec_c (leftover bits starting subsec_seq_node)
(-) unixts = f'{sec:032b}'
(-) unixts = unixts + "0000" # Pad end with 4 zeros to get 36-bit
subsec_binary = f'{subsec:030b}'
### Binary Conversions
### Need subsec_a (12 bits), subsec_b (12-bits), and subsec_c (leftover bits starting subsec_seq_node)
(+) nixts = f'{sec:036b}'
subsec_binary = f'{subsec:030b}'
Change 2:
# Convert Hex to Int then splice in dashes
(-) UUIDv7_hex = hex(int(UUIDv7_bin, 2))[2:]
UUIDv7_formatted = '-'.join(
# Convert Hex to Int then splice in dashes
(+) UUIDv7_hex = f'{UUIDv7_int:032x}' # int to hex
UUIDv7_formatted = '-'.join(
After tthese changes the UUID is generated with the right length (36) without padding:
before: 60c26bbe-7287-f469-602b-cf7423f3cb7
after: 060c4735-8bcb-7726-a200-1fd41eaa8a29
The padding can result in different time when one tries to call uuid.get_time()
.
@fabiolimace
After tthese changes the UUID is generated with the right length (36) without padding:
Change 2:
- This is only required due to change number 1 causing the operation of
int(UUIDv7_bin, 2)
to drop the leading 0s you padded earlier. Somewhat counter-intuitive sincef'{UUIDv7_int:032x}
re-pads.- With the current padding, least significant position, you can use either
UUIDv7_hex = hex(int(UUIDv7_bin, 2))[2:]
orUUIDv7_hex = f'{UUIDv7_int:032x}'
since they yield the same result of a 32 hex characters.The padding can result in different time when one tries to call uuid.get_time()
- The current implementation of
uuid.get_time()
will likely not be able to handle full UUIDv7 parsing until it is extended. By explicitly detailing the padding position this makes future extension of that easier. That is, if the spec is ratified as an official RFC.- With the current padding the decoder can always assume the first 32-bits of UUIDv7 are valid 32-bit Unix epoch. Decoding the remaining 4 bits along with the subsequent sub-second precision found in the rest of the UUIDv7 layout I would leave up to the implementer of the decoder.
Added v7 Python prototype plus testing