Open kyzer-davis opened 3 years ago
Adding Draft V01 quote from thayne: https://news.ycombinator.com/item?id=28088213
For UUIDv7 it isn't very clear on how the subsecond part is encoded. From the description of decoding it sounds like it would be multiplying the fraction of a second by 2^n. but that isn't very explicit. And if you want to avoid floating point arithmetic you'll need to include a 10^p in there where p is how many digits of precision you have in base 10 (such as 3 for milliseconds)
I don't like using 38, 24 and 12 bits for nanos, micros and seconds respectively. I think this could 'waste' a few bits. I prefer to use 30, 20 and 10. But it's just my preference.
However, fields of 38-24-12 sub-seconds can be very convenient to implement using string operations rather than bitwise operations. It's not efficient to create UUIDs with string operations, but it works well, at least in Python. At the end of this answer, there are 3 implementations of UUID7, tackling the problem with a 'character-oriented/based/driven' approach.
We can think of UUIDs as a 3-part string, leaving out version and variant:
sub
: 9 chars (36 bits)subsec
: 9 chars (36 bits)node
: 12 chars (48 bits)The subsec
part has 3 sub-parts:
subsec a
: 3 chars (12 bits)subsec b
: 3 chars (12 bits)subsec c
: 3 chars (12 bits)The 'character-oriented' layout is this:
|---------|---|v|---|r|---|------------|
| secs | subsecs | node |
| a | | b | | c |
v: version: [7]
r: variant: [89ab]
These are the character settings for nanosecond, microsecond and millisecond precision, respectively:
----------------------------------------
field chars bits value
----------------------------------------
secs 9 36 seconds
subsec_abc 9 36 nanoseconds
node 12 48 random
version 1 4 [7]
variant 1 4 [89ab]
32 128
----------------------------------------
Settings for nanoseconds precision
----------------------------------------
field chars bits value
----------------------------------------
secs 9 36 seconds
subsec_ab 6 24 microseconds
subsec_c 3 12 random
node 12 48 random
version 1 4 [7]
variant 1 4 [89ab]
32 128
----------------------------------------
Settings for microseconds precision
----------------------------------------
field chars bits value
----------------------------------------
secs 9 36 seconds
subsec_a 3 12 milliseconds
subsec_bc 6 24 random
node 12 48 random
version 1 4 [7]
variant 1 4 [89ab]
32 128
----------------------------------------
Settings for millisecond precision
Here are the Python implementations of the 3 configurations shown above:
#---------------------------------
# UUID7 with nanosecond precision
#---------------------------------
import uuid
import time
import random
now = time.time_ns()
sec = now // 10**9
subsec = now % 10**9
f1 = f'{sec:09x}'
f2 = f'{subsec:09x}'
f3 = f'{random.randrange(2**48):012x}'
v1 = '7' # version
v2 = random.choice(['8', '9', 'a', 'b']) # variant
id = f1 + f2 + f3
id = id[:12] + v1 + id[12:]
id = id[:16] + v2 + id[16:]
uid = uuid.UUID(id)
print(uid) # 06113208-6028-78d2-aa42-aae9eba8a7a2
#---------------------------------
# UUID7 with microsecond precision
#---------------------------------
import uuid
import time
import random
now = time.time_ns()
sec = now // 10**9
subsec = now % 10**9 // 10**3 # ajust to micros
f1 = f'{sec:09x}'
f2 = f'{subsec:06x}'
fx = f'{random.randrange(2**12):03x}' # eXtra bits: random
f3 = f'{random.randrange(2**48):012x}' # node bits: random
v1 = '7' # version
v2 = random.choice(['8', '9', 'a', 'b']) # variant
id = f1 + f2 + fx + f3
id = id[:12] + v1 + id[12:]
id = id[:16] + v2 + id[16:]
uid = uuid.UUID(id)
print(uid) # 06113208-e044-7660-b754-b43ac4b4a721
#---------------------------------
# UUID7 with millisecond precision
#---------------------------------
import uuid
import time
import random
now = time.time_ns()
sec = now // 10**9
subsec = now % 10**9 // 10**6 # ajust to millis
f1 = f'{sec:09x}'
f2 = f'{subsec:03x}'
fx = f'{random.randrange(2**24):06x}' # eXtra bits: random
f3 = f'{random.randrange(2**48):012x}' # node bits: random
v1 = '7' # version
v2 = random.choice(['8', '9', 'a', 'b']) # variant
id = f1 + f2 + fx + f3
id = id[:12] + v1 + id[12:]
id = id[:16] + v2 + id[16:]
uid = uuid.UUID(id)
print(uid) # 06113209-430f-783b-89b8-68d0adb7fa01
Sorry, I didn't focus on the main question in my previous answer.
I prefer the first case (one-second faction), although the second (specific number of ms/us/ns) is simpler and more efficient.
As I said in a previous issue, encoding and decoding can be done with fixed-point arithmetic. We don't have to deal with floating-point math directly.
To encode, do the following:
subsec = fraction * 2^bits
To decode, do the reverse:
fraction = subsec / 2^bits
That's it.
The decoder can extract the exact original fraction, unless the decoder reads more bits than the encoder used.
The excellent example given by @bradleypeabody above can be expressed like this in Python:
bits = 12
fraction = 0.321
subsec = round(fraction * 2**bits)
To decode, the reverse:
bits = 12
subsec = 1315
fraction = round(subsec / 2**bits, 3)
I used round()
because int()
caused slight differences when I was writing an example.
In the other case, which simply copies the subsec
'verbatim', you don't do that. To encode, you simply place the subsec
bits in their expected positions. That is all. The encoder doesn't care what the decoder does to extract the original fraction. The examples in my previous answer do it.
@fabiolimace Great writeups. I agree with all points.
However, if the alternative option only ever requires a maximum of 30 bits to encode up to nanoseconds properly why not make the total bits ever allocated to subsec_a/b/c
30 bits?
12/12/12 is nice thematically and for subsec_b
since the ver/var placement is sub-optimal for what we want to do. But with 30 bits maximum we can free up 6 bits that could either be shifted into unixts
for a larger timestamp and more future proofing or node
to be used as additional sequence counter space (missed in your python examples) or more random (which was a consistent point of contention in the threads here.)
Note: I really like those ASCII tables. I may use that format in draft-02 to help illustrate UUIDv6/7/8 Basic Creation Algorithm examples along with the current ASCII bit layout format and pseudocode steps
TLDR; look at the bottom bit layout for the proposal
Yeah there seems to be a lot of pain caused by how not-aligned the various fields are. A friend of mind has been hacking away on this: http://www.new-uuid.info/ (just put it up and still wip) and we went through the same thing - dealing with raw bits in most any language is painful (Go in this case).
Some ideas:
1. One idea might be to try changing the UUID7 spec so it is more byte-aligned - more practical to use and implement. Like taking the ver and var fields and blocking off the entire byte that each of those use and giving some meaning to the extra bits there, but otherwise everything else can be by moved into place in whole bits. To make it really usable we'd probably want to change the 36-bit timestamp to either a 32-bit unsigned int, or another option would to be make it 40 bits (5 whole bytes).
2. Another option, crazy idea here, bear with me: What if we ditch the positions of these old fields and instead provide another means of reliably detecting v6/7/8 UUID? This whole thing would be way simpler if we didn't have to dance around those two old legacy fields (ver and var). And while their positions are obviously locked into old software, we could probably get away with breaking that for newer values IF there were still a reliable way for software to detect which version was in use. Actually, just looking at this newly again - https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.1 :
The variant field determines the layout of the UUID. That is, the interpretation of all other bits in the UUID depends on the setting of the bits in the variant field.
...
1 1 1 Reserved for future definition.
That might be our ticket right there.... Per the spec, if we set those three bits to 1s, then we can change whatever we want. I mean, I'm sure we'll break someone's implementation somewhere, but per the spec it's legit - we can move or remove the version field if we set those bits. Okay so....
What if we set those var bits to 1s, and then make the rest of the byte the version field?! By combining these legacy fields, we then only have to work around this one byte:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| var | ver | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
(this incidentally will provide 5 version bits and allow for 32 UUID versions instead of 16 with the existing 4 bits)
If you ask me, this is definitely simpler.
From there, the main thing for UUID7 to figure out alignment for is this unix timestamp field. I'm not sure how hard we want try / complex we want to make it, in solving that. There are some simple transforms that would solve the problem. E.g. using the number of minutes since unix epoch instead of seconds fits in 32 bits up to year 10136, and then we could have a sub-minute portion instead of a sub-second portion. But I'm not sure any of this is better than bit shifting or just choosing 40 bits for the integer timestamp. Also, I mean heck, a uint32 gets us to the year 2106 - maybe we should just go with that and assume that if people are still using this standard 85 years from now it will have served its purpose and it's time for another version... If we did that, we'd get something like this:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unix_timestamp_u32 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| subsec_seq_rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| var | ver | subsec_seq_rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| subsec_seq_rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
(Where subsec_seq_rand is a concatenation of the subsecond value, sequence counter and random data, with whatever lengths the implementation wants to choose for each, up to a max length of 88 bits / 11 bytes)
I know this would be a big change, but it would definitely be simpler and easier on the development side and to show/explain to people.
I'm still thinking this over myself and just throwing this out there. Let me know what y'all think.
Well, @bradleypeabody I said that it's annoying, but it's not not possible.
First, data for everyone, I just finished couple more tweaks of http://www.new-uuid.info. The website is written in Go and runs my implementation of the UUIDv7 generator. Repo here.
On the website, you can literally “dial in” the precision lengths for different parts of the UUID and see on-line how this generator provides you values.
At the end of this website, I added a simple playground where you can try to convert milliseconds into "Brad's encoded" milliseconds, just so it's real to touch.
Now, as for the implementation. This is a very rough draft. Although it's working.
To simplify what you are talking about (ver and var fields) you can just subtract the length of those fields from a total length of UUID (128 - 4 - 2 = 122 bits) and just work with 122 bit structure. Once you're done populating it, you just copy it into the final bit array, adding those values in set positions.
As for working with bits themselves, it's not that complicated.
// getBit returns the value of a bit at a specified positon in given byte array
func getBit(b []byte, index int) bool {
pos := index / 8
j := uint(index % 8)
j = 7 - j
return (b[pos] & (uint8(1) << j)) != 0
}
// setBit sets the value of a bit at a specified positon in []byte
func setBit(b []byte, index int, value bool) {
pos := index / 8
j := uint(index % 8)
j = 7 - j
if value {
b[pos] |= (uint8(1) << j)
} else {
b[pos] &= ^(uint8(1) << j)
}
}
And this is if you are very concerned about memory consumption (which is not the case nowadays anyway). You can use an array of bools instead. But I did it “the right way”.
Extracting the timestamp from this version of UUID is elementary from a viewpoint of a processor.
// Timestamp returns unix epoch stored in the struct without millisecond precision
func (u UUIDv7) Time() time.Time {
bytes := [16]byte(u)
tmp := toUint64(bytes[0:5])
tmp = tmp >> 4 //We are off by 4 last bits of the byte there.
return time.Unix(int64(tmp), 0)
}
You just use the first 38 bits (5 bytes) and shift the resulting value by 4. This is something that any 64bit processor would be able to do in a click.
It takes longer to extract other data. Currently, I'm using getBit/setBit, but it can be done with logical shifts. Those would be complex from the viewpoint of a programmer, but super-easy for a processor.
Pros: It's good that I can adjust the format. For example, if I know that I'm usually generating about 10 values per second, I could set sub-second precision and counter to 4 bits each and be happy with the random data that goes after. Cons: Took me about 20 hours to write it in golang. Disclaimer: golang is not my native tongue, so I had to pick it up, as I go. A pro in C++ or C# would be able to re-build this generator in one day.
I guess the main question is: "Who would be implementing this system?".
I wouldn't go as far as changing the position of the version bit. It would be better to keep the compatibility with libraries implementing RFC4122. Sure cutting nanoseconds in three way makes it a bit more complicated but not marginally in my opinion.
I also did some experimentation with UUIDv7 in go: https://github.com/coding-socks/uuiddraft/blob/v7/uuid.go. This implementation already contains the earlier suggestion to only use 30 bits (12, 12, 6) for nanosecond precision. Having the different subsec sections is as simple as the following:
nsec := timestamp % int64(time.Second)
subsecA := (nsec >> 18) & 0xfff
subsecB := (nsec >> 6) & 0xfff
subsecC := nsec & 0x3f
Going line by line:
nsec
is int64 but only 30 bits is used12 + 6 = 18
to the right so we continue with the left-most 46 bits 64 - 18 = 46
and 0xfff
makes sure we keep the 12 right-most bits of that 46 bits.6 = 6
to the right so we continue with the left-most 58 bits 64 - 6 = 58
and 0xfff
makes sure we keep the 12 right-most bits of that 58 bits.0x3f
makes sure we keep the 6 right-most bits.With the current 38 bits (12, 12, 14) nanosecond precision it would be the following:
nsec := timestamp % int64(time.Second)
subsecA := (nsec >> 26) & 0xfff
subsecB := (nsec >> 14) & 0xfff
subsecC := nsec & 0x3fff
Going line by line:
nsec
is int64 but only 30 bits is used12 + 14 = 26
to the right so we continue with the left-most 38 bits 64 - 26 = 38
and 0xfff
makes sure we keep the 12 right-most bits of that 38 bits.14 = 14
to the right so we continue with the left-most 50 bits 64 - 14 = 50
and 0xfff
makes sure we keep the 12 right-most bits of that 50 bits.0x3fff
makes sure we keep the 14 right-most bits.It seems I'm one of those people who thinks bit manipulation is easy. 😄
To implement all precisions I'm waiting for Go 1.17 which will add Time.UnixMilli
and Time.UnixMicro
methods.
@nurked is very cool, I enjoy the visualizations to help drive the point home!
As tedious as the version/variant are, I agree with the others in leaving them where they is the best decision.
That being said, Brad's comment about the variant bits got me thinking, why not implement both?
With this in mind we could also set the definition of any UUID Version (UUIDv1/2/3/4/5/6/7/8) + Variant E (111) as a method for signaling an alternative bit layout to any previously defined UUID. With that precedent set UUIDv6 could be converted to UUIDv1 (0001) + Variant E/F (111) as a method to signal the alternative encoding for UUIDv1 (Which is the entire goal of UUIDv6 at the moment)
Edit: See #26 for further discussion on UUIDvXε variant usage.
@kyzer-davis I really like the idea of UUIDv1ε because UUIDv6 in draft 01 is almost the same as UUIDv1 with different byte order. On the other hand, I don't quite understand the difference between UUIDv6 and UUIDv6ε. Do you mind opening an other issue to discuss the possible renaming to focus on subsecond precision in this issue?
If UUIDv7 subsecond precision in case of nanoseconds is changed from 36 to 30 bits the layout would became the following:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unixts |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|unixts | nsec | ver | nsec |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var| nsec | seq | rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
From:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unixts |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|unixts | nsec | ver | nsec |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var| nsec | seq | rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
In both cases nsec is cut in 3 peaces there for I think it is better to keep it as compact as possible. However, in case of millisecond and microsecond precision decreasing nsec does not provide as much benefit because they don't free up a byte for seq
.
Also, in case of 36 bits we could consider a way which would allow implementers to avoid bit shifting.
subsecA
for subsecond milliseconds. 10 bits stored on 12 bits.subsecB
for submillisecond microseconds. 10 bits stored on 12 bits.subsecC
for submicrosecond nanoseconds. 10 bits stored on 14 bits.Simple division and remainder calculation should be enough and bit shifting by 10 is still an option.
Finally, 2 bits for subver
could be added somewhere which tells the the precision of UUIDv7 but I still hesitant with this idea and should be discussed in an other issue.
Edit: Statement about bit shifting being an option is not true.
Do you mind opening an other issue to discuss the possible renaming to focus on subsecond precision in this issue?
Done, #26 open, will edit my original comment.
What do you think about having a fixed length subsec_seq
value? The base for it would be UUIDv7 with 30 bit nsec.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unixts |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|unixts | subsec_seq | ver | subsec_seq |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var| subsec_seq | rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
subsec_seq
can be calculated the following ways:
(subsec_nano << 6) | seq
.((subsec_micro * 1000) << 6) | seq
.((subsec_milli * 1000 * 1000) << 6) | seq
.seq
could be limited to 6, 9 (1000 = 0b1111101000
), 12 (1000 * 1000 = 0b11110100001001000000
) bit length via &
.
This would allow
ns / 100
with 7 bit seq
.Mixed precision example:
1629140065080 * 1000 * 1000 = 1629140065080000000 = 0b1011010011011110111100100000111110100011100110011111000000000
1629140065080093 * 1000 = 1629140065080093000 = 0b1011010011011110111100100000111110100011101001010100101001000
1629140065080093020 * 1 = 1629140065080093020 = 0b1011010011011110111100100000111110100011101001010100101011100
Order 1629140065080000000
(from System A), 1629140065080093000
(from System B), 1629140065080093020
(from System C).
I'm trying to avoid using 64 bit integer because some scripting language uses IEEE-754 double-precision (64 bit) format to represent double and float. In this case 53 int is the biggest they can represent. This means those languages cannot represent a fake 64 bit unix nano. Menawhile, a 34 int unixts
and a 36 int subsec_seq
coulde be used.
I assume 48 bit for random bits should be sufficient for millisecond, microsecond and nanosecond precision as well.
I assume 48 bit for random bits should be sufficient for millisecond, microsecond and nanosecond precision as well.
IMO this is too little to avoid collisions. Especially with millisecond timestamp precision.
This means those languages cannot represent a fake 64 bit unix nano.
It only takes a few lines of code to get a 64-bit multiplication result using 32-bit arithmetic. Example
Continuing on the idea of subsec_seq
the sequence could have an integer limit instead of bit limit. This would mean instead of bitwise |
a simple addition would be used when combining sequence and subsec precision.
And sequence is allowed to overflow. Overflow could be done with seq = seq % (64 * x)
which might be slower than bitwise operations.
@edo1
I assume 48 bit for random bits should be sufficient for millisecond, microsecond and nanosecond precision as well.
IMO this is too little to avoid collisions. Especially with millisecond timestamp precision.
Take into consideration sequence as well. With the rules from my previous comment it would be 48 bits random and 12 bits sequence in case of millisecond precision. With the rules from this comment it would be 48 bits random and a sequence with 64_000_000 maximum value. The later would allow the same theoretical maximum for each precision.
I'm also thinking about how high resolution time could be used in addition to sequence.
This means those languages cannot represent a fake 64 bit unix nano.
It only takes a few lines of code to get a 64-bit multiplication result using 32-bit arithmetic. Example
You are correct, I totally forgot you can use two 32 bit int as a 64 bit int. I will play with this after work.
I would like to show you how to encode and decode in Python, Java and Javascript using the timestamp with 40 bits and 10-milli precision.
IMO a timestamp with 40 bits and 10-milli precision is easy to encode and decode. It also has the best 'score' in the spreadsheet created by @edo1 (https://github.com/uuid6/uuid6-ietf-draft/issues/23#issuecomment-899156413) for a 200 years life span. Also it is easy to describe the algorithm: just use the scale factor.
Also, it's not difficult to encode and decode in Javascript. You have to encode and decode the timestamp and the timestamp extension separately to get around the limitation of the Number
object.
Here is (yet) another layout/structure I would like propose:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp | timestamp_ext | ver | timestamp_ext |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var| rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
timestamp: 40 bits, 10-milli precision
timestamp_ext: 20 bits, can be used to extend the precision up to 10-nano
In Python you use the scale factor 100 * 2**20
to encode and decode.
Encode to fixed-point:
scale_factor = 100 * 2**20
timestamp = unixts * scale_factor
Decode back to floating-point:
unixts = timestamp / scale_factor
Here is a complete example:
import time
DEC_SCALE_FACTOR = 100
BIN_SCALE_FACTOR = 2**20
SCALE_FACTOR = DEC_SCALE_FACTOR * BIN_SCALE_FACTOR
def encode(unix_time):
'''Encode to FIXED-point number'''
return int(unix_time * SCALE_FACTOR)
def decode(timestamp):
'''Decode to floating-point number'''
return float(timestamp / SCALE_FACTOR)
# get UNIXTS as floating-point
unixts = time.time()
# encode UNIXTS to FIXED-point
timestamp = encode(unixts)
# decode timestamp to floating-point
unixts_decoded = decode(timestamp)
print(unixts)
print(unixts_decoded)
In Java you also use the scale factor 100 * 2**20
to encode and decode.
Encode to fixed-point:
double scale_factor = 100.0 * Math.pow(2, 20);
double unixts = System.currentTimeMillis() / 1000;
long timestamp = (long) (unixts * scale_factor);
Decode back to floating-point:
double unixts_decoded = timestamp / scale_factor;
Here is a complete example:
public final class Uuid7Timestamp40b {
private static final double DEC_SCALE_FACTOR = 100;
private static final double BIN_SCALE_FACTOR = 1 << 20; // 2**20
private static final double SCALE_FACTOR = DEC_SCALE_FACTOR * BIN_SCALE_FACTOR;
public static long encode(final double unixts) {
return (long) (unixts * SCALE_FACTOR);
}
public static double decode(final long timestamp) {
return timestamp / SCALE_FACTOR;
}
public static double time() {
return System.currentTimeMillis() / 1000.0;
}
public static void main(String[] args) {
// get UNIXTS as floating-point
double unixts = time();
// encode UNIXTS to FIXED-point
long timestamp = encode(unixts);
// decode timestamp to floating-point
double unixts_decoded = decode(timestamp);
System.out.println(unixts);
System.out.println(unixts_decoded);
}
}
In Javascript you use a decimal scale factor 100
and a binary scale factor 2**20
to encode and decode.
You can't use the fixed-point representation because of the internal representation of Number in javascript. But you can represent the timestamp as a hexadecimal.
Encode to hexadecimal:
Decode back to Number:
Here is a complete example:
var DEC_SCALE_FACTOR = 100.0
var BIN_SCALE_FACTOR = Math.pow(2, 20)
function time() {
return (new Date()).getTime() / 1000
}
function int(n) {
return Math.floor(n)
}
// output in hexadecimal
function encode(unixts) {
unixts = unixts * DEC_SCALE_FACTOR
// encode the first 40 bits
timestamp = int(unixts)
hexa = timestamp.toString(16)
padded = ("0000000000"+hexa).slice(-10)
// encode the last 20 bits
timestamp_ext = int((unixts - int(unixts)) * BIN_SCALE_FACTOR)
hexa_ext = timestamp_ext.toString(16)
padded_ext = ("00000"+hexa_ext).slice(-5)
return padded + padded_ext
}
// input in hexadecimal
function decode(timestamp) {
// decode the first 40 bits
hexa = timestamp.slice(0,10)
parsed = parseInt(hexa, 16)
// decode the last 20 bits
hexa_ext = timestamp.slice(10,15)
parsed_ext = parseInt(hexa_ext, 16) / BIN_SCALE_FACTOR
return (parsed + parsed_ext) / DEC_SCALE_FACTOR
}
unixts = time()
timestamp = encode(unixts)
unixts_decoded = decode(timestamp)
console.log(unixts)
console.log(unixts_decoded)
What if a 64 bit integer is used for nanosecond time whit a 1024 * precision
sequence? The structure would look like this:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| time_hi |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| time_mid | ver | time_low_and_seq_hi |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var| tlsh | seq_low | node |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| node |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
time
: A 64 bit version of the current unix time in nanosecond or closest time in nanosecond. It should resolve times until the year 2262.precision
: The multiplier which needed to achieve nanosecond time.seq
: Clock counter with a maximum value of 1024 * precision. Bit length depends on precision but minimum 10 and maximum 40 bit is needed.seq_low
: Right most 10 bit of seq
.seq_hi
: seq
without the right most 10 bit.time_hi
: time
from 0 to 32 bit.time_mid
: time
from 33 to 48 bit.time_low_and_seq_hi
: (time + seq_hi)
from 48 to 60 bit.tlsh
: (time + seq_hi)
from 60 to 64 bit.node
: random bits.I created an example in NodeJS: https://gist.github.com/nerg4l/7de56f715486fada84ac4c1f27a17b23. JavaScript does not have a 64 bit int primitive and when bitwise operations are used it converts the value to a 32 bit int. There for, most of the calculations had to be done in interesting ways.
I'm not sure if this is the correct way forward. It looks like a unix based UUIDv6 with 64 bit time.
In a lot of cases the main pain point is still the unfortunate position of ver
and var
.
Lastly, I was looking into maximum time resolutions on different operating systems https://en.wikipedia.org/wiki/System_time. For example, according to the linked wiki the best Windows can do is 100 ns but macOS/OSX can do 1 ns with mach_absolute_time
and mach_timebase_info
(at leat this is how go resolves time now on darwin) which is not in the wiki.
This was dropped from UUIDv7 in Draft 03 although I have re-worked some of the lessons learned into 5.1. Timestamp Granularity > Sub-second Precision and Accuracy bullet.
Question: Factions of a second representation or specific number of milliseconds/microseconds/nanoseconds for subsecond precision encoding within UUIDv7's
subsec_a
,subsec_b
, andsubsec_seq_node
Current draft-02 Implementation as per @bradleypeabody (who wrote the UUIDv7 section)
Sources from email between Brad and I on the topic:
Alternative is specific number of milliseconds/microseconds/nanoseconds as integer.
Sources: https://en.wikipedia.org/wiki/Orders_of_magnitude_(time)
More Discussion on the alternative:
If alternative was utilized in Draft 02 we would encode as such:
subsec_a
is fully dedicated to millisecond information with a pad of two bits at the start since we only need 10. Easier to do this padding since Version bits come up right after.subsec_a
andsubsec_b
utilized for microsecond information with 4 bits padded at the start since we only require 20. Easier to pad here because Variant bits are coming up.subsec_a
,subsec_b
, and 6 bits ofsubsec_seq_nod
used for nanosecond information. Final 48 bits ofsubsec_seq_nod
are used for random/entropy/node data. No padding required for this scenario.