Closed fabiolimace closed 3 years ago
Thanks @fabiolimace and sorry for the delayed reply.
Yes, I think you are describing the idea that we're moving forward with. An important concept here is that the timestamp is the only useful thing that other systems may need to extract from the UUID, and that systems may require different precisions of sub-second timestamps.
Because of these properties, one system can encode the sub-second part with n bits of precision, and another system can decode it with n-1 or n+1 precision and they will get approximately the same resulting number - with the according loss of precision due to either truncation or reading extra random bits into the sub-second timestamp part. But the caller does not need to know or care about this - which provides for excellent interoperability.
I'm going to try to hammer out the last edits that are in my court now/very soon and coordinate with @kyzer-davis about getting this new draft submitted.
For the seconds part, I'm going to try with 36 bits. This will continue to work for over 2000 years into the future; and it also makes it so if you're decoding with millisecond precision you can read right up to the ver field for the ms part.
I also think that 36 bits are enough for the seconds. It allows the maximum timestamp to be close to 4147 A.D. It is more than the theoretical limit of 3400 A.D. in RFC-4122, considering a signed timestamp.
I am happy that the project is still moving forward.
@fabiolimace Do you have cycles to review the current IETF published RFC draft v01 for UUIDv7, make any modifications to your sample python code and open a PR to add this UUIDv7 python prototype to the code over on https://github.com/uuid6/prototypes ?
If not let me know and I will transpose this code with the required changes to merge it into the python prototypes while I am doing some work in the branch draft-01-updates
.
@ kyzer-davis I can open a PR next weekend to add a UUIDv7 prototype to the code at https://github.com/uuid6/prototypes
@kyzer-davis sorry for the delayed reply.
I opened a pull request: https://github.com/uuid6/prototypes/pull/2
I'm closing this issue due to the new draft. #14
I'm trying to implement this in JS, just a question about the 36 bits and the 12 bits for uuidv7 and big-endian order.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unixts |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|unixts | msec | ver | seq |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var| rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
For unixts
and msec
, together they are 48 bits.
If I had unixts
as an integer meaning seconds, and msec
as an integer meaning milliseconds.
Would it be correct to do:
const unixts = 1633576858;
const msec = 944;
const combined = (unixts << 12) | (0x0fff & msec);
// maybe if i truncate it to the first 36 bits first?
const combined2 = ((0xFFFFFFFFF & unixts) << 12) | (0x0fff & msec);
Then with combined
I could take the first 5 bytes and put it into a typed array in big-endian order?
Actually I realised that the unixts
number is too big, the shifting by 12 is not going to work, unless I were to first truncate it...
It seems this could work: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt
Just wanted to point out that I've been making good progress on js-id https://github.com/MatrixAI/js-id, and solved the above problems. For now using bit strings, although the library can be made more efficient with bitwise operators in the future.
I've also noticed that UUIDv7 can support logical timestamps as well, could be a useful usecase when you don't want to leak the wall clock time, but still preserve ordering of IDs.
Hi @CMCDragonkai - thanks for your work on this. Glad to hear it's coming along.
One thing to keep in mind as that UUIDv7 is still a draft and various ideas are being considered (and have gone back and forth), so keep that in mind that the spec will almost certainly end up changing.
I've also noticed that UUIDv7 can support logical timestamps as well, could be a useful usecase when you don't want to leak the wall clock time, but still preserve ordering of IDs. Yeah the time does not need to be super granular or have accuracy guarantees, and in fact this is one of the issues that has come up and is leaning toward only having millisecond precision in the time stamp.
Thanks, we need to use it in our projects now, so it's going into production. Hopefully any changes won't be a big deal and our current situation will be compatible.
BTW I encountered a problem with the fixed point algorithm that wasn't clear with the reference python implementation I was following and the spec discussed above. With 12 bits, if the fixed point conversion resulted in the number 4096, this would cause the 12 bit sector to overflow and be all zeros. This corrupted the sort order of the ids.
To solve this in my implementation:
/**
* Converts floating point number to a fixed point tuple
* Size is number of bits allocated for the fractional
* Precision dictates a fixed number of decimal places for the fractional
*/
function toFixedPoint(
floating: number,
size: number,
precision?: number,
): [number, number] {
let integer = Math.trunc(floating);
let fractional: number;
if (precision == null) {
fractional = floating % 1;
} else {
fractional = roundPrecise(floating % 1, precision);
}
// If the fractional is rounded to 1
// then it should be added to the integer
if (fractional === 1) {
integer += fractional;
fractional = 0;
}
// Floor is used to round down to a number that can be represented by the bit size
// if ceil or round was used, it's possible to return a number that would overflow the bit size
// for example if 12 bits is used, then 4096 would overflow to all zeros
// the maximum for 12 bit is 4095
const fractionalFixed = Math.floor(fractional * 2 ** size);
return [integer, fractionalFixed];
}
Basically after acquiring the fractional
, I check whether it is 1
and if so, add it to the integer
. Otherwise when it goes to being converted to fixed point number, it's still essential to use Math.floor
so it can never get 4096.
Details on my fix is here: https://github.com/MatrixAI/js-id/pull/8
@bradleypeabody, @kyzer-davis,
This is a continuation of my comment on PR #10
I thought of another approach. Instead of treating the integer part and the fractional part separately, the entire timestamp can be encoded in a fixed-point number.
I imagined another structure that reserves the last 32 bits for random bits or application-specific data. I'm not sure if anyone will need more than 90 bits for the timestamp. A field with 38 bits for the integer part and 52 bits for the fractional part is enough store a decimal number with 15 digits after the decimal point, like this: 0.123456789012345.
The structure described below also works with 48 bits reserved for milliseconds, requiring very small changes. By the way, I think that 48 bits for milliseconds may be more convenient in contexts where the best precision you can get is milliseconds, as in Java 8, because you don't need to encode the sub-millisecond part.
Sorry for sending spam to your project. I am very happy with the progress of this RFC update. I'm just trying to contribute some ideas, and I don't expect them to be accepted.
Structure with 90 bits for timestamp
This proposed structure consists of two fields: unixtime and random. The only mandatory requirements are to fill in the first 38 bits (48 bits for ms is better?) of the
unixtime
field with the current seconds and fill in the 6 bits reserved for version and variant with their standardized values. The number of sub-second bits in theunixtime
field is variable/arbitrary. And therandom
field does not have to be filled with random bits. The name of this field is just a hint that its value by default is random.unixtime: a 90-bit field reserved for the Unix timestamp. This field is encoded as a fixed-point number in the form
fixed<38,n>
, where38
is the number of bits before the binary point, andn
is the number of bits after the binary point. The sizen
is arbitrary so that it can be defined by the system that generates the UUID. The bits not used in this field can be filled with a sequence counter and/or some random bits, for example. The timestamp can represent dates up to the year 10680 A.D. (2^38/60/60/24/365.25 + 1970).random: a 32-bit field reserved for random bits or application-specific data. Part or all of this field can be used, for example, to identify the UUID generator. This field is opaque, so its bits have no meaning, except for the system that generates the UUID.
Encoding and decoding the sub-second part
Say you have a fractional value between 0 and 1 represented as
f
, you can encode that fractional value with the precision ofn
bits like this:To decode a
subsec
withn
precision you do the inverse operation:This tutorial describes an efficient method for converting floating-point to fixed-point in MATLAB: Fixed-Point.pdf.
Generic algorithm to generate a UUIDv7
2^n
, wheren
is the number of bits that fits to the system time precision. If the precision is millisecond,n
is equal to 10 (2^10 = 1,024). If the precision is microsecond,n
is equal to 20 (2^20 = 1.048.576) and so on;n
bits of the UUID with then
bits of the previous multiplication product;Generic algorithm to extract the unix second
Generic algorithm to extract the sub-second fraction
n
bits starting from the 38th bit position of the UUID, wheren
is the number of bits that fits the system time precision;2^n
to get a fractional value;To obtain a floating-point representation of the timestamp you can sum the two parts.
An implementation in python
Yesterday I implemented this simple python generator to test the concept:
OUTPUT:
Python time precision
The first version of the generator above used
time.time()
. This method appears to return floating point numbers with 22 bits after the binary point. I had to replace it withtime.time_ns()
to fix the nanosecond precision. If you need nanosecond precision, don't usetime.time()
.This loop prints the sub-seconds returned by
time.time()
in binary format:This is the output of the previous loop:
Note that the last 8 bits are zeros.