A more complex structure of generated version 7 UUIDs is needed

sergeyprokhorenko commented 1 year ago

Hi Andrew,

Could you please describe in the README page the structure of your generated version 7 UUIDs, something like https://habr.com/ru/post/658855/comments/#comment_24491432

x4m commented 1 year ago

Hi! Well, pull requests are welcome :) Unfortunately I'm better at writing code than docs :(

sergeyprokhorenko commented 1 year ago

Andrey, your implementation is too simplified and therefore is not reliable enough. It needs to be advanced for database applications. It does not take into account many concerns that are described in the standard draft and were discussed in detail during its development:

millisecond level of precision,
excluded leap seconds,
handling of the system clock move backward,
counter for monotonicity,
randomly initialized counter portion,
two counter rollover guard methods,
optional node identifiers,
reseeded CSPRNG,
impermissibility of fatal errors,
UUID as left part of the key field.

I strongly advise you to read the sections: 6.1. Timestamp Considerations 6.2. Monotonicity and Counters 6.4. Distributed UUID Generation 6.8. Unguessability 6.12. DBMS and Database Considerations

x4m commented 1 year ago

Actually the code is taken from standard draft :) If you wish a better standard adherence, consider sending a pull request. I'll be happy to review it. But I'd propose to send amendments to IETF group first, because the code is taken from there...

sergeyprokhorenko commented 1 year ago

Luckily this code was thrown out from the new version of standard draft: https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/02/

I am not a programmer. Actualy I am a system analist and I use SQL, not C language. So I cannot write a pull request for you.

It would be a pity if a wonderful updated standard got a poor implementation in a hurry. It's well known that every complex problem has a simple solution that doesn't work.

x4m commented 1 year ago

OK, I can help with C part. Can you plz sort needed improvements by some priority and help me develop incremental changed towards "real" UUIDv7 and UUIDv8? I cannot allocate big chunks of time, but can gradually work on this. What is easiest change that will make current implementation better?

sergeyprokhorenko commented 1 year ago

OK, I can help with C part. Can you plz sort needed improvements by some priority and help me develop incremental changed towards "real" UUIDv7 and UUIDv8? I cannot allocate big chunks of time, but can gradually work on this. What is easiest change that will make current implementation better?

I'll try to do it this week

sergeyprokhorenko commented 1 year ago

Required UUIDv7 generator features for PostgreSQL

No.	UUIDv7 feature for PostgreSQL	Section of draft-ietf-uuidrev-rfc4122bis-02	Field length and position	Complexity	Importance	Goals
1	Millisecond level of precision	5.7. UUID Version 7	The leftmost 48 bits of UUID	low	high	Monotonicity, Space saving
2	Excluded leap seconds	5.7. UUID Version 7		high	low	Monotonicity
3	Temporal time shift forward to compensate for accidental system clock reset backwards	6.1. Timestamp Considerations		high	high	Monotonicity, Fault tolerance
4	Periodical random time shift	6.1. Timestamp Considerations		high	low	Secrecy, Unguessability
5	Generating UUID on system clock failure with last known timestamp	6.1. Timestamp Considerations		low	high	Fault tolerance
6	Counter	6.2. Monotonicity and Counters	15 bits next to the right of the timestamp	low	high	Monotonicity, Space saving
7	Randomly initialized counter portion	6.2. Monotonicity and Counters	The rightmost 14 bits of the counter	low	high	Uniqueness
8	Zero initialized counter portion	6.2. Monotonicity and Counters	The leftmost 1 bit of the counter	low	high	Guarding against counter rollovers
9	Timestamp increment	6.2. Monotonicity and Counters		high	low	Guarding against counter rollovers
10	Random node identifier	6.4. Distributed UUID Generation	The rightmost few bits of the UUID	high	low	Uniqueness
11	Reseeded CSPRNG	6.8. Unguessability		low	high	Uniqueness, Unguessability
12	Optional segment next to the right of UUID in the key DB columns	6.12. DBMS and Database Considerations	32 bits next to the right of the UUID	low	high	Identification of system, service, table, source, operation, message, shard, partition etc., Check sum
13	New data type UUID160 with optional 32-bit segment next to the right of the UUID	n/a		high	high	Better developer experience
14	Keeping the UUID generator settings in named JSON format separate from program code (non-standard feature)	n/a		high	low	Better developer experience
15	Setting the UUID output format (binary, integer, "hex-and-dash" string format, non-standard Crockford's Base32)	4. UUID Format		high	low	Better developer experience
16	Generation of UUIDs or UUID fields in advance	6.3. UUID Generator States		high	low	Better performance
17	Automated migration of key fields to UUID and UUID160 data types providing backup, order preservation, referential integrity and replacing autoincrement with generation, preserving the natural key	n/a		high	high	Better developer experience

x4m commented 1 year ago

Is it all applied to v7 only? We are not looking at v8 right now, are we?

I think in v7 we already have property (1). So maybe let's start with (11) "Reseeded CSPRNG". How often should we reseed PRGN? I think it is already reseeded on fork()s.

sergeyprokhorenko commented 1 year ago

Is it all applied to v7 only? We are not looking at v8 right now, are we?

It is correct. UUIDv8 is not intended for mass market

I think in v7 we already have property (1).

I don't understand C language well, so I didn't see this property in your program

So maybe let's start with (11) "Reseeded CSPRNG". How often should we reseed PRGN?

It is up to you

I think it is already reseeded on fork()s.

Maybe. It's better to check

sergeyprokhorenko commented 1 year ago

OK, I can help with C part. ... I cannot allocate big chunks of time, but can gradually work on this.

Andrey, I strongly advise you to co-operate with the PostgreSQL community, such as Oleg Bartunov obartunov@postgrespro.ru Ivan Panchenko i.panchenko@postgrespro.ru Ivan Frolkov i.frolkov@postgrespro.ru

pgsql-hackers: https://www.postgresql.org/message-id/PH0PR11MB5029DF5E0A0EAF8E3CC2C652BBA29@PH0PR11MB5029.namprd11.prod.outlook.com

x4m commented 1 year ago

Are you aware that you are posting a link to my thread in pgsql-hackers? (and yes, surely I'll cooperate with PgPro if they express interest in the topic)

sergeyprokhorenko commented 1 year ago

Ivan Frolkov

Yes, I am

x4m commented 1 year ago

Sure, I'll chat with Ivan next monday on pgconf.ru .

So, let's work on this implementation. Where do we start to do it better?

sergeyprokhorenko commented 1 year ago

I know about the conference: https://pgconf.ru/2023/345765 We should start from co-operation with Postgres Professional: discussion of the features required and work planning

sergeyprokhorenko commented 1 year ago

I talked to Ivan Panchenko about ULIDs with counters 4 years ego. But it was too early

Oleg Bartunov was aware of this discussion

sergeyprokhorenko commented 1 year ago

I added the 17th feature into the table

sergeyprokhorenko commented 1 year ago

Andrey, did you manage to discuss the joint development of the UUIDv7 generator with Ivan Frolkov?

x4m commented 1 year ago

Not yet, the conference start on April 2nd (I was wrong about next Monday, sorry)

sergeyprokhorenko commented 1 year ago

My drawing of the proposed UUIDv7 structure:

x4m commented 1 year ago

Interesting, thanks.My implementation lacks that counter bits and metadata. Counter is easy to implement, but does is have to be little endian, big endian or, perhaps, can be in native format?It’s straightforward to add this counter to GitHub implementation, however I’d suppose to do development in pgsql-hackers.Do you have a telegram, btw?--Отправлено из мобильной Яндекс Почты29.07.2023, 01:53, "Sergey Prokhorenko" @.***>: My drawing of the proposed UUIDv7 structure:

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

x4m commented 1 year ago

BTW this 15 bits counter and 65 bits of entropy (compared to 16 bits counter and 64 bits of entropy) is going to be significant performance drainer. We do not generate entropy bit by bit. We are going to generate 5 unaligned bytes and overwrite 7 bits by counter. Personally I don't see any benefits from this counter at all.

x4m commented 1 year ago

OK, I've read the standard. This particular version https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-08#name-uuid-version-7

the counter is totally optional. So current implementation adheres to standard.
randomness does not sprawl to counter's byte. So having counter will not affect performance.

sergeyprokhorenko commented 1 year ago

Andrey,

My Telegram account is https://t.me/SergeyProkhorenko (@SergeyProkhorenko)

The rate of UUIDv7 generation by modern computers usually does not allow filling the counter completely longer than 15 bits. But I don't mind a little increase in the length of the counter.

The counter must be big-endian, as is the timestamp.

The counter has the following benefits:

Higher search speed for records created as a result of bulk generation of records at submillisecond intervals when the system clock is not sufficiently accurate
Space saving (for longer UUIDv7's pseudo-random segment and therefore better collision resistance) compared to submillisecond timestamp segment
Better collision resistance due to initialization with a pseudo-random number compared to submillisecond timestamp segment

The initialization every millisecond and increment of the 15-bit counter are independent of the generation of the 59-bit UUIDv7's pseudo-random segment.

x4m / pg_uuid_next

A more complex structure of generated version 7 UUIDs is needed #1

Required UUIDv7 generator features for PostgreSQL