x4m / pg_uuid_next

Implementation of UUID v7 and v8 IETF drafts
Other
12 stars 0 forks source link

A more complex structure of generated version 7 UUIDs is needed #1

Open sergeyprokhorenko opened 1 year ago

sergeyprokhorenko commented 1 year ago

Hi Andrew,

Could you please describe in the README page the structure of your generated version 7 UUIDs, something like https://habr.com/ru/post/658855/comments/#comment_24491432

x4m commented 1 year ago

Hi! Well, pull requests are welcome :) Unfortunately I'm better at writing code than docs :(

sergeyprokhorenko commented 1 year ago

Andrey, your implementation is too simplified and therefore is not reliable enough. It needs to be advanced for database applications. It does not take into account many concerns that are described in the standard draft and were discussed in detail during its development:

I strongly advise you to read the sections: 6.1. Timestamp Considerations 6.2. Monotonicity and Counters 6.4. Distributed UUID Generation 6.8. Unguessability 6.12. DBMS and Database Considerations

x4m commented 1 year ago

Actually the code is taken from standard draft :) If you wish a better standard adherence, consider sending a pull request. I'll be happy to review it. But I'd propose to send amendments to IETF group first, because the code is taken from there...

sergeyprokhorenko commented 1 year ago

Luckily this code was thrown out from the new version of standard draft: https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/02/

I am not a programmer. Actualy I am a system analist and I use SQL, not C language. So I cannot write a pull request for you.

It would be a pity if a wonderful updated standard got a poor implementation in a hurry. It's well known that every complex problem has a simple solution that doesn't work.

x4m commented 1 year ago

OK, I can help with C part. Can you plz sort needed improvements by some priority and help me develop incremental changed towards "real" UUIDv7 and UUIDv8? I cannot allocate big chunks of time, but can gradually work on this. What is easiest change that will make current implementation better?

sergeyprokhorenko commented 1 year ago

OK, I can help with C part. Can you plz sort needed improvements by some priority and help me develop incremental changed towards "real" UUIDv7 and UUIDv8? I cannot allocate big chunks of time, but can gradually work on this. What is easiest change that will make current implementation better?

I'll try to do it this week

sergeyprokhorenko commented 1 year ago

Required UUIDv7 generator features for PostgreSQL

No. UUIDv7 feature for PostgreSQL Section of draft-ietf-uuidrev-rfc4122bis-02 Field length and position Complexity Importance Goals
1 Millisecond level of precision 5.7. UUID Version 7 The leftmost 48 bits of UUID low high Monotonicity, Space saving
2 Excluded leap seconds 5.7. UUID Version 7   high low Monotonicity
3 Temporal time shift forward to compensate for accidental system clock reset backwards 6.1. Timestamp Considerations   high high Monotonicity, Fault tolerance
4 Periodical random time shift 6.1. Timestamp Considerations   high low Secrecy, Unguessability
5 Generating UUID on system clock failure with last known timestamp 6.1. Timestamp Considerations   low high Fault tolerance
6 Counter 6.2. Monotonicity and Counters 15 bits next to the right of the timestamp low high Monotonicity, Space saving
7 Randomly initialized counter portion 6.2. Monotonicity and Counters The rightmost 14 bits of the counter low high Uniqueness
8 Zero initialized counter portion 6.2. Monotonicity and Counters The leftmost 1 bit of the counter low high Guarding against counter rollovers
9 Timestamp increment 6.2. Monotonicity and Counters   high low Guarding against counter rollovers
10 Random node identifier 6.4. Distributed UUID Generation The rightmost few bits of the UUID high low Uniqueness
11 Reseeded CSPRNG 6.8. Unguessability   low high Uniqueness, Unguessability
12 Optional segment next to the right of UUID in the key DB columns 6.12. DBMS and Database Considerations 32 bits next to the right of the UUID low high Identification of system, service, table, source, operation, message, shard, partition etc., Check sum
13 New data type UUID160 with optional 32-bit segment next to the right of the UUID n/a   high high Better developer experience
14 Keeping the UUID generator settings in named JSON format separate from program code (non-standard feature) n/a   high low Better developer experience
15 Setting the UUID output format (binary, integer, "hex-and-dash" string format, non-standard Crockford's Base32) 4. UUID Format   high low Better developer experience
16 Generation of UUIDs or UUID fields in advance 6.3. UUID Generator States   high low Better performance
17 Automated migration of key fields to UUID and UUID160 data types providing backup, order preservation, referential integrity and replacing autoincrement with generation, preserving the natural key n/a   high high Better developer experience
x4m commented 1 year ago

Is it all applied to v7 only? We are not looking at v8 right now, are we?

I think in v7 we already have property (1). So maybe let's start with (11) "Reseeded CSPRNG". How often should we reseed PRGN? I think it is already reseeded on fork()s.

sergeyprokhorenko commented 1 year ago

Is it all applied to v7 only? We are not looking at v8 right now, are we?

It is correct. UUIDv8 is not intended for mass market

I think in v7 we already have property (1).

I don't understand C language well, so I didn't see this property in your program

So maybe let's start with (11) "Reseeded CSPRNG". How often should we reseed PRGN?

It is up to you

I think it is already reseeded on fork()s.

Maybe. It's better to check

sergeyprokhorenko commented 1 year ago

OK, I can help with C part. ... I cannot allocate big chunks of time, but can gradually work on this.

Andrey, I strongly advise you to co-operate with the PostgreSQL community, such as Oleg Bartunov obartunov@postgrespro.ru Ivan Panchenko i.panchenko@postgrespro.ru Ivan Frolkov i.frolkov@postgrespro.ru

pgsql-hackers: https://www.postgresql.org/message-id/PH0PR11MB5029DF5E0A0EAF8E3CC2C652BBA29@PH0PR11MB5029.namprd11.prod.outlook.com

x4m commented 1 year ago

Are you aware that you are posting a link to my thread in pgsql-hackers? (and yes, surely I'll cooperate with PgPro if they express interest in the topic)

sergeyprokhorenko commented 1 year ago

Ivan Frolkov

Yes, I am

x4m commented 1 year ago

Sure, I'll chat with Ivan next monday on pgconf.ru .

So, let's work on this implementation. Where do we start to do it better?

sergeyprokhorenko commented 1 year ago

I know about the conference: https://pgconf.ru/2023/345765 We should start from co-operation with Postgres Professional: discussion of the features required and work planning

sergeyprokhorenko commented 1 year ago

I talked to Ivan Panchenko about ULIDs with counters 4 years ego. But it was too early

Oleg Bartunov was aware of this discussion

sergeyprokhorenko commented 1 year ago

I added the 17th feature into the table

sergeyprokhorenko commented 1 year ago

Andrey, did you manage to discuss the joint development of the UUIDv7 generator with Ivan Frolkov?

x4m commented 1 year ago

Not yet, the conference start on April 2nd (I was wrong about next Monday, sorry)

sergeyprokhorenko commented 1 year ago

My drawing of the proposed UUIDv7 structure:

UUID structure
x4m commented 1 year ago

Interesting, thanks.My implementation lacks that counter bits and metadata. Counter is easy to implement, but does is have to be little endian, big endian or, perhaps, can be in native format?It’s straightforward to add this counter to GitHub implementation, however I’d suppose to do development in pgsql-hackers.Do you have a telegram, btw?--Отправлено из мобильной Яндекс Почты29.07.2023, 01:53, "Sergey Prokhorenko" @.***>: My drawing of the proposed UUIDv7 structure:

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

x4m commented 1 year ago

BTW this 15 bits counter and 65 bits of entropy (compared to 16 bits counter and 64 bits of entropy) is going to be significant performance drainer. We do not generate entropy bit by bit. We are going to generate 5 unaligned bytes and overwrite 7 bits by counter. Personally I don't see any benefits from this counter at all.

x4m commented 1 year ago

OK, I've read the standard. This particular version https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-08#name-uuid-version-7

  1. the counter is totally optional. So current implementation adheres to standard.
  2. randomness does not sprawl to counter's byte. So having counter will not affect performance.
sergeyprokhorenko commented 1 year ago

Andrey,

My Telegram account is https://t.me/SergeyProkhorenko (@SergeyProkhorenko)

The rate of UUIDv7 generation by modern computers usually does not allow filling the counter completely longer than 15 bits. But I don't mind a little increase in the length of the counter.

The counter must be big-endian, as is the timestamp.

The counter has the following benefits:

The initialization every millisecond and increment of the 15-bit counter are independent of the generation of the 59-bit UUIDv7's pseudo-random segment.