Open sergeyprokhorenko opened 1 year ago
Hi! Well, pull requests are welcome :) Unfortunately I'm better at writing code than docs :(
Andrey, your implementation is too simplified and therefore is not reliable enough. It needs to be advanced for database applications. It does not take into account many concerns that are described in the standard draft and were discussed in detail during its development:
I strongly advise you to read the sections: 6.1. Timestamp Considerations 6.2. Monotonicity and Counters 6.4. Distributed UUID Generation 6.8. Unguessability 6.12. DBMS and Database Considerations
Actually the code is taken from standard draft :) If you wish a better standard adherence, consider sending a pull request. I'll be happy to review it. But I'd propose to send amendments to IETF group first, because the code is taken from there...
Luckily this code was thrown out from the new version of standard draft: https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/02/
I am not a programmer. Actualy I am a system analist and I use SQL, not C language. So I cannot write a pull request for you.
It would be a pity if a wonderful updated standard got a poor implementation in a hurry. It's well known that every complex problem has a simple solution that doesn't work.
OK, I can help with C part. Can you plz sort needed improvements by some priority and help me develop incremental changed towards "real" UUIDv7 and UUIDv8? I cannot allocate big chunks of time, but can gradually work on this. What is easiest change that will make current implementation better?
OK, I can help with C part. Can you plz sort needed improvements by some priority and help me develop incremental changed towards "real" UUIDv7 and UUIDv8? I cannot allocate big chunks of time, but can gradually work on this. What is easiest change that will make current implementation better?
I'll try to do it this week
No. | UUIDv7 feature for PostgreSQL | Section of draft-ietf-uuidrev-rfc4122bis-02 | Field length and position | Complexity | Importance | Goals |
---|---|---|---|---|---|---|
1 | Millisecond level of precision | 5.7. UUID Version 7 | The leftmost 48 bits of UUID | low | high | Monotonicity, Space saving |
2 | Excluded leap seconds | 5.7. UUID Version 7 | high | low | Monotonicity | |
3 | Temporal time shift forward to compensate for accidental system clock reset backwards | 6.1. Timestamp Considerations | high | high | Monotonicity, Fault tolerance | |
4 | Periodical random time shift | 6.1. Timestamp Considerations | high | low | Secrecy, Unguessability | |
5 | Generating UUID on system clock failure with last known timestamp | 6.1. Timestamp Considerations | low | high | Fault tolerance | |
6 | Counter | 6.2. Monotonicity and Counters | 15 bits next to the right of the timestamp | low | high | Monotonicity, Space saving |
7 | Randomly initialized counter portion | 6.2. Monotonicity and Counters | The rightmost 14 bits of the counter | low | high | Uniqueness |
8 | Zero initialized counter portion | 6.2. Monotonicity and Counters | The leftmost 1 bit of the counter | low | high | Guarding against counter rollovers |
9 | Timestamp increment | 6.2. Monotonicity and Counters | high | low | Guarding against counter rollovers | |
10 | Random node identifier | 6.4. Distributed UUID Generation | The rightmost few bits of the UUID | high | low | Uniqueness |
11 | Reseeded CSPRNG | 6.8. Unguessability | low | high | Uniqueness, Unguessability | |
12 | Optional segment next to the right of UUID in the key DB columns | 6.12. DBMS and Database Considerations | 32 bits next to the right of the UUID | low | high | Identification of system, service, table, source, operation, message, shard, partition etc., Check sum |
13 | New data type UUID160 with optional 32-bit segment next to the right of the UUID | n/a | high | high | Better developer experience | |
14 | Keeping the UUID generator settings in named JSON format separate from program code (non-standard feature) | n/a | high | low | Better developer experience | |
15 | Setting the UUID output format (binary, integer, "hex-and-dash" string format, non-standard Crockford's Base32) | 4. UUID Format | high | low | Better developer experience | |
16 | Generation of UUIDs or UUID fields in advance | 6.3. UUID Generator States | high | low | Better performance | |
17 | Automated migration of key fields to UUID and UUID160 data types providing backup, order preservation, referential integrity and replacing autoincrement with generation, preserving the natural key | n/a | high | high | Better developer experience |
Is it all applied to v7 only? We are not looking at v8 right now, are we?
I think in v7 we already have property (1). So maybe let's start with (11) "Reseeded CSPRNG". How often should we reseed PRGN? I think it is already reseeded on fork()s.
Is it all applied to v7 only? We are not looking at v8 right now, are we?
It is correct. UUIDv8 is not intended for mass market
I think in v7 we already have property (1).
I don't understand C language well, so I didn't see this property in your program
So maybe let's start with (11) "Reseeded CSPRNG". How often should we reseed PRGN?
It is up to you
I think it is already reseeded on fork()s.
Maybe. It's better to check
OK, I can help with C part. ... I cannot allocate big chunks of time, but can gradually work on this.
Andrey, I strongly advise you to co-operate with the PostgreSQL community, such as Oleg Bartunov obartunov@postgrespro.ru Ivan Panchenko i.panchenko@postgrespro.ru Ivan Frolkov i.frolkov@postgrespro.ru
pgsql-hackers: https://www.postgresql.org/message-id/PH0PR11MB5029DF5E0A0EAF8E3CC2C652BBA29@PH0PR11MB5029.namprd11.prod.outlook.com
Are you aware that you are posting a link to my thread in pgsql-hackers? (and yes, surely I'll cooperate with PgPro if they express interest in the topic)
Ivan Frolkov
Yes, I am
Sure, I'll chat with Ivan next monday on pgconf.ru .
So, let's work on this implementation. Where do we start to do it better?
I know about the conference: https://pgconf.ru/2023/345765 We should start from co-operation with Postgres Professional: discussion of the features required and work planning
I talked to Ivan Panchenko about ULIDs with counters 4 years ego. But it was too early
Oleg Bartunov was aware of this discussion
I added the 17th feature into the table
Andrey, did you manage to discuss the joint development of the UUIDv7 generator with Ivan Frolkov?
Not yet, the conference start on April 2nd (I was wrong about next Monday, sorry)
My drawing of the proposed UUIDv7 structure:
Interesting, thanks.My implementation lacks that counter bits and metadata. Counter is easy to implement, but does is have to be little endian, big endian or, perhaps, can be in native format?It’s straightforward to add this counter to GitHub implementation, however I’d suppose to do development in pgsql-hackers.Do you have a telegram, btw?--Отправлено из мобильной Яндекс Почты29.07.2023, 01:53, "Sergey Prokhorenko" @.***>: My drawing of the proposed UUIDv7 structure:
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
BTW this 15 bits counter and 65 bits of entropy (compared to 16 bits counter and 64 bits of entropy) is going to be significant performance drainer. We do not generate entropy bit by bit. We are going to generate 5 unaligned bytes and overwrite 7 bits by counter. Personally I don't see any benefits from this counter at all.
OK, I've read the standard. This particular version https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-08#name-uuid-version-7
Andrey,
My Telegram account is https://t.me/SergeyProkhorenko (@SergeyProkhorenko
)
The rate of UUIDv7 generation by modern computers usually does not allow filling the counter completely longer than 15 bits. But I don't mind a little increase in the length of the counter.
The counter must be big-endian, as is the timestamp.
The counter has the following benefits:
The initialization every millisecond and increment of the 15-bit counter are independent of the generation of the 59-bit UUIDv7's pseudo-random segment.
Hi Andrew,
Could you please describe in the README page the structure of your generated version 7 UUIDs, something like https://habr.com/ru/post/658855/comments/#comment_24491432