Discussion: Monotonicity and Counters

kyzer-davis commented 2 years ago

All things Monotonicity and Counters to assist with sort ordering, db index locality, and same Timestamp-tick collision avoidance during batch UUID creation.

Sub-Topics: Counter position, length, rollover handling and seeding.

Extension of the thread: https://github.com/uuid6/uuid6-ietf-draft/pull/58#discussion_r810593208

Required Reading: https://github.com/uuid6/uuid6-ietf-draft/issues/41#issuecomment-907517063 and Research efforts

sergeyprokhorenko commented 2 years ago

Please include this in version 7:

A counter MAY be placed instead of the left side of the random section, immediately after the timestamp. The counter is designed to keep UUIDs monotonic within a timestamp step, by default within a millisecond. The default counter length is 16 bits for the millisecond timestamp. The counter MUST be reset every timestamp step. The counter MUST be frozen just before overflowing up to the next timestep step. For each UUID instance, the random section MUST be updated despite the use of the counter.

broofa commented 2 years ago

A counter MAY be placed instead of...

Is there any benefit to this over simply treating the entire rand field as a counter, the way ULID does? In which case, the counter is effectively at the very end ("far right") side of the random field.

The advantage of ULID approach is that counter rollover so statistically unlikely as to not warrant mention. Whereas placement at the left side of the field requires reasoning about creation rates, counter sizes, and rollover error cases.

sergeyprokhorenko commented 2 years ago

Is there any benefit to this over simply treating the entire rand field as a counter, the way ULID does?

The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.
If random part is close to 111...111, then you have too short counter

broofa commented 2 years ago

The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.

Isn't this true of any monotonic counter scheme? Unless you randomize all bits with every uuid, there will be some degree of predictability.

If random part is close to 111...111, then you have too short counter

Again, true of any counter scheme.

Treating the whole 70-bit random value as a counter means the odds of actually encountering a rollover case are very small. Even if the lowest 20 bits turn over (1M uuids / msec), the top-most 50 bits would have to all be ones. There's only a ~1 in 10¹⁵ chance of that occurring. For the truly cautious, simply checking for that case and generating a different random value, or just forcing any of those high-order bits to be zero, eliminates that possibility.

sergeyprokhorenko commented 2 years ago

The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.

Isn't this true of any monotonic counter scheme? Unless you randomize all bits with every uuid, there will be some degree of predictability.

l beleive that all random bits of every uuid MUST be randomized despite using counter. Such monotonic counter scheme is secure.

If random part is close to 111...111, then you have too short counter

Again, true of any counter scheme.

No, if we have 16 empty bits of separate counter for every millisecond.

Treating the whole 70-bit random value as a counter means the odds of actually encountering a rollover case are very small. Even if the lowest 20 bits turn over (1M uuids / msec), the top-most 50 bits would have to all be ones. There's only a ~1 in 1015 chance of that occurring. For the truly cautious, simply checking for that case and generating a different random value, or just forcing any of those high-order bits to be zero, eliminates that possibility.

You are correct.

LiosK commented 2 years ago

I agree that it should generally be encouraged to entirely randomize some (say, 16 or 32 bits) trailing bits at every generation to ensure some levels of unpredictability, rather than use the whole rand field as a monotonic random counter. I think the draft RFC should state this explicitly to guide implementers, but the thing is, it is very difficult to determine the appropriate lengths of the monotonic random and per-ID random. So, perhaps it might be best for the draft to just state this point and have implementers decide appropriate sizes.

peterbourgon commented 2 years ago

oklog/ulid.Monotonic provides monotonicity parameterized by a user-provided inc value. Higher inc values provide less predictable output at the cost of fewer overall possible outputs (i.e. higher probability of collision).

IMO — specs should not take any stance on any property of random portions of IDs. Specs should define them as random, and if implementations want to provide monotonicity within some bounds, that's above-and-beyond spec guarantees.

sergeyprokhorenko commented 2 years ago

@peterbourgon If something is not in the specification, then this is a 99.99% guarantee that this will not be in the implementation either. Therefore, the specification should contain at least a more or less satisfactory solution as an option.

sergeyprokhorenko commented 2 years ago

I agree that it should generally be encouraged to entirely randomize some (say, 16 or 32 bits) trailing bits at every generation to ensure some levels of unpredictability, rather than use the whole rand field as a monotonic random counter. I think the draft RFC should state this explicitly to guide implementers, but the thing is, it is very difficult to determine the appropriate lengths of the monotonic random and per-ID random. So, perhaps it might be best for the draft to just state this point and have implementers decide appropriate sizes.

@LiosK I like your solution very much. I would add freezing just before overflowing the counter, although this is unlikely.

peterbourgon commented 2 years ago

If something is not in the specification, then this is a 99.99% guarantee that this will not be in the implementation either.

I don't think I agree, but I also don't think this is a bad thing. In fact, I think it's correct! The spec defines the random portion of the ID to be random. If those bytes happen to be monotonic, that's a detail of the implementation.

kyzer-davis commented 2 years ago

@peterbourgon @sergeyprokhorenko I just merged Draft 03 into master. Could you take a look at the latest Monotonicity and Counters section? I believe it already hits on both of your points.

sergeyprokhorenko commented 2 years ago

@kyzer-davis

Waiting for the timestamp to advance can slow down the generation of UUIDs and thus break the application. The UUID generation should not be a source of possible run-time errors for the application. It is better to allow a slight non-monotonicity, which the DBMS can handle quite well. It will only slow down the search for some records a little. In any case, with several data sources, a slight non-monotonicity is inevitable.
I suggest to add the left part of to the text: "With this method the left part of random data is extended to also double as a counter" to avoid of guessability of UUIDs. The right part of random data will not be frozen till the next timestamp tick.

peterbourgon commented 2 years ago

UUID generation is necessarily a possible source of runtime errors, because entropy is exhaustible.

sergeyprokhorenko commented 2 years ago

UUID generation is necessarily a possible source of runtime errors, because entropy is exhaustible.

@peterbourgon This is not true. It is entirely possible to prevent errors during the generation of UUIDs. It is better to prevent errors if possible.

If you mean collisions of UUIDs, then such errors do not occur during generation, but when writing to the table.

peterbourgon commented 2 years ago

If you run out of entropy, as far as I'm aware your options are (a) fail, or (b) block until entropy is replenished.

kyzer-davis commented 2 years ago

@sergeyprokhorenko,

I suggest to add the left part of to the text: "With this method the left part of random data is extended to also double as a counter" to avoid of guessability of UUIDs. The right part of random data will not be frozen till the next timestamp tick.

I assume you are referencing rand_a, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"

I simply say you SHOULD use a monotonic random and for fixed-length counters you SHOULD use UUIDv8 because of the heated debates around counter length, rollover handling, and initialization of the counter have so many variations based on user input that it is better fit for UUIDv8.

If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7 rand_a MAY double as a counter when utilizing this method.

Waiting for the timestamp to advance can slow down the generation of UUIDs and thus break the application. The UUID generation should not be a source of possible run-time errors for the application. It is better to allow a slight non-monotonicity, which the DBMS can handle quite well. It will only slow down the search for some records a little. In any case, with several data sources, a slight non-monotonicity is inevitable.

The text let's the application implementers decide and provides some guidance. Both usages in this section are SHOULD not MUST. Implementations are fit to do what they want. If they desire to let some items go out of order then they may.

sergeyprokhorenko commented 2 years ago

If you run out of entropy, as far as I'm aware your options are (a) fail, or (b) block until entropy is replenished.

@peterbourgon It sounds like "run out of numbers" :))

sergeyprokhorenko commented 2 years ago

I assume you are referencing rand_a, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"

@kyzer-davis No, I mean https://github.com/uuid6/uuid6-ietf-draft/issues/60#issuecomment-1053554344 I like it very much

sergeyprokhorenko commented 2 years ago

If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7 rand_a MAY double as a counter when utilizing this method.

@kyzer-davis Yes please. That would be great!

peterbourgon commented 2 years ago

@sergeyprokhorenko Running out of entropy is much more common than running out of numbers :)

System administrators, especially those supervising Internet servers, have to ensure that the server processes will not halt because of entropy depletion.

https://en.m.wikipedia.org/wiki/Entropy_(computing)

sergeyprokhorenko commented 2 years ago

The text let's the application implementers decide and provides some guidance. Both usages in this section are SHOULD not MUST. Implementations are fit to do what they want. If they desire to let some items go out of order then they may.

@kyzer-davis It seems to me that the specification should suggest to developers a safer option, and not an unexpected pause in the generation of UUIDs when the counter overflows. Overflows always remind me of the Ariane flight V88 disaster

sergeyprokhorenko commented 2 years ago

@sergeyprokhorenko Running out of entropy is much more common than running out of numbers :)

System administrators, especially those supervising Internet servers, have to ensure that the server processes will not halt because of entropy depletion.

https://en.m.wikipedia.org/wiki/Entropy_(computing)

This is about the low quality of pseudo-random numbers, but not about their deficit. Poor quality generators generate a repeating sequence of pseudo-random numbers, which leads to collisions. The problem of low quality of pseudo-random numbers is already solved in the spec: https://uuid6.github.io/uuid6-ietf-draft/#name-unguessability

kyzer-davis commented 2 years ago

@sergeyprokhorenko

If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7 rand_a MAY double as a counter when utilizing this method.

@kyzer-davis Yes please. That would be great!

Let me put together a Change Proposal template for this and get it in Draft 03 Phase 3 PR.

I assume you are referencing rand_a, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"

@kyzer-davis No, I mean #60 (comment) I like it very much

I apologize, somehow I missed this comment entirely. It does seem like a valid approach but seems pretty similar to Fixed-Length Dedicated Counter Bits (Method 1) at the moment. The difference is in semantics. My text states to dedicated a portion of bits after the timestamp; this text states to dedicate a portion of the most significant, left-most, part of the random. 9/10 times they will likely be the same thing.

I could update Fixed-Length Dedicated Counter Seeding: to describe this practice. I took a quick swing below:

Fixed-Length Dedicated Counter Seeding:

Implementations utilizing either fixed-length counter method MAY randomly initialize the counter with each new timestamp tick. However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented by one.
When utilizing a randomly seeded counter alongside Method 1; the random MAY be regenerated with each counter increment without impacting storability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments.
A randomly seeded counter alongside Method 2 is more-or-less the same as Monotonic Random (Method 3).
Implementations utilizing either fixed-length counter method MAY also choose to initialize the counter at zero to ensure the full bit-space is utilized and help avoid counter rollovers. This approach has less entropy and more guessibility but ensures the most of the counter bit space.

sergeyprokhorenko commented 2 years ago

@sergeyprokhorenko ...did you even check #76? With this update both Method 1 and Method 3 are recommended counter solutions for v7E.

I deleted my comment

kyzer-davis commented 2 years ago

I deleted my comment

As did I :)

Let me know what you think about that proposed "Fixed-Length Dedicated Counter Seeding:" text. If it looks good I can write up a change proposal template and possibly sneak it in the open PR.

peterbourgon commented 2 years ago

@sergeyprokhorenko

This is about the low quality of pseudo-random numbers, but not about their deficit.

I believe you're mistaken. Both are issues, but I'm pointing out the fact that entropy — crypto-grade and pseudorandom both — is a resource which can be depleted. If something consumes that resource, then it necessarily must deal with the possibility that there is no entropy available in a given period of time.

sergeyprokhorenko commented 2 years ago

@peterbourgon You probably mean the insufficient speed of generating pseudo-random numbers and UUIDs. I think that multi-threaded generation and buffering of pseudo-random numbers could solve this problem.

peterbourgon commented 2 years ago

No, I don't mean that either :) I mean literally that entropy is a stream of bytes which can pause or stop altogether for unbounded periods of time. Please read the linked article.

LiosK commented 2 years ago

I feel the current content of Monotonicity and Counters section to be a little bit confusing. It's basically very permissive, so personally I feel okay with this because almost everything is permitted, but some implementers might get lost.

I think the decisions left up to implementers by this section could be summarized to the following two points, if the spec clearly required that the counter field be placed right after the timestamp and the remaining bits be filled with random:

Zero-starting counter or randomly seeded counter
Length of counter bits from 0 bits to 72 bits

Under this framework, the three methods can be described as:

Method 1: 12-24-bit zero-starting or randomly seeded counter
Method 2: 72-bit randomly seeded or partially (i.e. significant bits frozen for a given timestamp tick) randomly seeded counter
Method 3: 72-bit randomly seeded counter
Single-node without batch: 0-bit counter

By https://github.com/uuid6/uuid6-ietf-draft/issues/60#issuecomment-1053554344, I meant the spec should clearly state the counter should generally be limited to 40 or 56 bits. Although unguessability is definitely not the priority, I just don't want to see implementers carelessly leave a potential attack surface.

Then, how about omitting or amalgamating the Method 1, 2, 3 description and reorganizing the related sections like the following?

4.3. UUID Version 7E

Current content + rand_a and rand_b together MUST be dedicated to 0-72-bit counter and remaining random, in this order.

5.2. Monotonicity and Counters

AS IS

Fixed-Length Dedicated Counter Length

Current content + UUIDv7 SHOULD utilize 0-40-bit counter.

Fixed-Length Dedicated Counter Seeding:

Current content + UUIDv7 SHOULD utilize randomly seeded counter; UUIDv8 may utilize zero-starting counter.

Fixed-Length Dedicated Counter Rollover Handling

AS IS

Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature:

AS IS

sergeyprokhorenko commented 2 years ago

Current content + UUIDv7 SHOULD utilize randomly seeded counter; UUIDv8 may utilize zero-starting counter.

@LiosK I am against discrimination in relation to (16 bit) zero-starting counter.

sergeyprokhorenko commented 2 years ago

Current content + rand_a and rand_b together MUST be dedicated to 0-72-bit counter and remaining random, in this order.

@LiosK I propose this text:

Current content + rand_a and rand_b together (72 bit) MUST be dedicated to counter, random and metadata, in this order.

If zero-starting counter is used then the segments MUST have following lenth:

16-24-bit zero-starting counter
40-56-bit random
0-16-bit metadata

If randomly seeded counter is used then the segments MUST have following lenth:

24-40-bit randomly seeded counter
32-bit random
0-16-bit metadata

LiosK commented 2 years ago

I am against discrimination in relation to (16 bit) zero-starting counter.

This is a SHOULD scope so zero-starting counters are permitted. Zero-starting counters are so dangerous that they should not be recommended explicitly.

metadata

metadata stuff is application-specific and should generally be used in v8 only IMO. I don't have a strong objection (except new var value) against the UUIDv7 design currently compiled in the draft RFC, and thus the above proposal does not intend to alter the current UUIDv7 design.

sergeyprokhorenko commented 2 years ago

Zero-starting counters are so dangerous

@LiosK There is nothing dangerous in zero-starting counters

metadata stuff is application-specific and should generally be used in v8 only IMO

There is no logic here. IMO is not a rationale. Populating the metadata is simply specified by the function parameter, and so the metadata does not need to be detailed in the UUIDv7 specification. We want to get rid of composite keys, don't we? If metadata is prohibited in the UUID, then it will have to be dragged through additional fields, increasing chaos in the database.

LiosK commented 2 years ago

There is nothing dangerous in zero-starting counters

A zero-starting counter sacrifices entropy unless a constant number of IDs are generated per millisecond and an accurate estimate of the number is available. That doesn't happen in an ordinary use case.

metadata stuff is application-specific

is the rationale. Application-specific metadata damage to the universal uniqueness across applications and thus should be used with v8, which is application-specific by design. This is the same reason as why machine IDs and shared knowledge schemes are not utilized by v7.

We want to get rid of composite keys, don't we?

No. It's much better to use a separate metadata field and composite key than to parse UUID to extract the metadata. And, if you don't extract the metadata from UUID, then you don't either need to encode them in UUID.

sergeyprokhorenko commented 2 years ago

There is nothing dangerous in zero-starting counters

A zero-starting counter sacrifices entropy unless a constant number of IDs are generated per millisecond and an accurate estimate of the number is available. That doesn't happen in an ordinary use case.

sacrifices entropy? You forgot about 40-56-bit random

metadata stuff is application-specific

is the rationale. Application-specific metadata damage to the universal uniqueness across applications and thus should be used with v8, which is application-specific by design. This is the same reason as why machine IDs and shared knowledge schemes are not utilized by v7.

damage to the universal uniqueness? Uniqueness is not going anywhere. You forgot about random again

We want to get rid of composite keys, don't we?

No. It's much better to use a separate metadata field and composite key than to parse UUID to extract the metadata. And, if you don't extract the metadata from UUID, then you don't either need to encode them in UUID.

Let's leave it up to implementers to decide how many fields are more convenient for them, and which architecture provides better information system performance. Parsing is necessary in rare cases, while the use of a composite key always slows down the information system.

sergeyprokhorenko commented 2 years ago

@LiosK You'd better protest sacrificing 8 bits of randomness to a completely useless var_ver. In addition, var_ver breaks rand into two segments, which complicates the UUID generation algorithm. It would be better to make var_ver optional, filling it with random content by default.

LiosK commented 2 years ago

You'd better protest sacrificing 8 bits of randomness to a completely useless var_ver.

I did it for the one bit wasted by the new var. My allegiance to the universal uniqueness of v7 is so consistent that I am against all of the new var, zero-starting counter, and metadata field.

Let's leave it up to implementers to decide how many fields are more convenient for them, and which architecture provides better information system performance.

v8 is there for such users.

sergeyprokhorenko commented 2 years ago

Let's leave it up to implementers to decide how many fields are more convenient for them, and which architecture provides better information system performance.

v8 is there for such users.

such users are the most users, and all advanced users who use Data Vault Modeling (most banks) or Anchor Modeling (leading banks and marketplaces). All these users suffer from the use of composite keys.

v8 is not recommended: UUID version 8E provides an RFC-compatible format for experimental or vendor-specific use cases... UUIDv8E SHOULD only be utilized if an implementation cannot utilize another UUID in this document or [RFC4122] This is the stigma, untouchables caste, unwanted child. The authors outlined the most primitive and non-life scenario for v8. Therefore v8 will not be implemented by DBMS developers. v8 will never be used, and it's safe to drop this concept from the spec. Your advice to use v8 is just a polite form of refusal.

kyzer-davis commented 2 years ago

@LiosK,

but some implementers might get lost

I agree, I have iterated on that section a ton. It is close but I agree that it needs a bit more shape.

Personally I am leaning towards dropping method 2 altogether since it is basically method 3. That should help reduce some text and avoid some confusion.
As for the section, I mauled it over the weekend and have some ideas on how to reorganize. I mainly need to separate the methods from the method sub-topics. That will go a long way to promote readability.

@sergeyprokhorenko, do you think that line is that bad? I can yank it as the section conveys the same info with or without it.

peterbourgon commented 2 years ago

High-level question, please redirect me if this isn't the right forum —

Is there any way to distinguish a UUID with monotonic entropy (or whatever) which has been generated correctly, from one which has been generated incorrectly? I don't think so, right? Assuming not, why are these kinds of details part of the spec at all? What purpose is served by defining them here? Is it just an implementation guide?

bradleypeabody commented 2 years ago

Is there any way to distinguish a UUID with monotonic entropy (or whatever) which has been generated correctly, from one which has been generated incorrectly? I don't think so, right?

Correct. Outside the implementation used to generate the UUID value, there is no way to tell for any single UUID. (Just the resulting monotonic property of the sort order of a set of UUIDs generated with the same timestamp value.)

Is it just an implementation guide?

Unless someone wants to argue otherwise, yes, the entire point of this section in the spec is as an implementation guideline for those implementations that wish to implement monotonicity. It is not intended to be a requirement for all implementations. Two important reasons for this are a) not all implementations require monotonicity, and b) there are an increasing number of applications which cannot feasibly provide it anyway because UUIDs are being generated on different systems (e.g. a cluster of web servers each generating records separately, where the cost of providing true monotonicity across the cluster is not worth the effort).

I strongly believe that implementors of UUID generators should be free to implement monotonicity in essentially any way they see fit, since I don't see any one algorithm that will suit a majority of use cases (I can happily list half a dozen cases with radically different requirements if this is in question), and I don't see any material difference in the resulting values except to meet application specific needs - i.e. "UUIDs generated for my application absolutely must be monotonic when generated for xyz purpose" - fine, go ahead and do that, by all means. This does not mean everyone else must do the same. So the intention of the text is (IMO) just "for some cases monotonicity matters, if this is you (or you're making a library and want to include this option), here's a sensible recommendation on how it can be done, good luck".

LiosK commented 2 years ago

here's a sensible recommendation on how it can be done, good luck

Love this.

Monotonicity must be optional as it requires some sort of coordination that is just an unnecessary complexity for distributed applications. That said, many use cases still need monotonicity, and counters involve several caveats. That's why I think a sensible recommendation should be provided in the spec. I just don't want to see readers carelessly invent a vulnerable implementation such as:

Inappropriate use of zero-starting counter that simply wastes entropy for no reason:

062273a6-93df-7000-0000-31912d9c4cd9
062273a6-a608-7000-0000-3c9592dd85f9
062273a6-bf3c-7000-0000-77c3d673fbe0
062273a6-dd45-7000-0000-a750c1a3e1ec

Tail counter that comes out perfectly predictable results:

062273da-cfc3-75dd-a8ef-2ca2f3d45f71
062273da-cfc3-75dd-a8ef-2ca2f3d45f72
062273da-cfc3-75dd-a8ef-2ca2f3d45f73
062273da-cfc3-75dd-a8ef-2ca2f3d45f74

kyzer-davis commented 2 years ago

Is it just an implementation guide?

@bradleypeabody's point hits the nail on the head!

Adding a quote I saw somewhere during my IETF onboarding research for how to define an internet standard draft:

"Internet standards tended to be those written for implementers. International standards were written as documents to be obeyed"

sergeyprokhorenko commented 2 years ago

Inappropriate use of zero-starting counter that simply wastes entropy for no reason:
062273a6-93df-7000-0000-31912d9c4cd9
062273a6-a608-7000-0000-3c9592dd85f9
062273a6-bf3c-7000-0000-77c3d673fbe0
062273a6-dd45-7000-0000-a750c1a3e1ec

@LiosK,

No need to distort the truth! 16-bit zero-starting counter occupies 2 symbols only, not repeats 8 symbols like 7000-0000!

Collision resistance is determined not only by the random segment, but also by the rest of the UUID segments. For identifiers to match, they must be generated in the same millisecond, have the same random segments, the same counter values (this is impossible if they are generated by the same generator) and metadata values, and fall into the same database table. It is wrong to reduce collision resistance to the random segment.

By the way, frozen for a millisecond 24-40-bit random in your favorite randomly seeded counter simply wastes entropy for no reason.

LiosK commented 2 years ago

16-bit zero-starting counter occupies 2 symbols only

16-bit zero-starting counter occupies 4 symbols and that is a big enough problem. Anyway, the above example is perfectly valid to illustrate the underlying concern.

Collision resistance is determined not only by the random segment

It doesn't mean the random segment can be wasted for no reason. In particular, as of the pre-draft 03, entropy is the key factor to guarantee the uniqueness of v7 and thus has to be handled carefully.

For the sake of understanding of readers, it is best to keep UUID specs to ensure universal uniqueness to a reasonable extent, and the current draft does a very good job by separating application-specific items that contradict with uniqueness (e.g. metadata) to v8 and saying explicitly "UUIDv8E's uniqueness will be implementation-specific and SHOULD NOT be assumed." I would support this approach.

randomly seeded counter simply wastes entropy for no reason.

Please take a look at the previous discussion and the article @fabiolimace found out. Randomly seeded counters do not sacrifice the universal uniqueness. Rather, they improve collision resistance in some scenarios.

sergeyprokhorenko commented 2 years ago

16-bit zero-starting counter occupies 4 symbols and that is a big enough problem. Anyway, the above example is perfectly valid to illustrate the underlying concern.

(4/128)*100%=3% is not a big problem, and these 3% are not wasted - they improve collision resistance in monolitic databases with only one generator per table (alike autoincrement) and in highload applications also (with thousands of records per millisecond). Therefore zero-starting counter does not sacrifice universal uniqueness. The same can be said about 16 bits of metadata (metadata also improves collision resistance).

Zero-starting counter is better for monolitic databases, and randomly seeded counter is better for distributed applications

peterbourgon commented 2 years ago

the entire point of this section in the spec is as an implementation guideline for those implementations that wish to implement monotonicity. It is not intended to be a requirement for all implementations.

Got it, thanks for the explanation. It doesn't seem like there is consensus on this topic even among the participants in this discussion. Given that, I wonder if the spec should "recommend" anything at all.

LiosK commented 2 years ago

(4/128)*100%=3% is not a big problem

16 / 128 = 12.5% is obviously a big problem. Please double check your numbers.

Zero-starting counter is better for monolitic databases, and randomly seeded counter is better for distributed applications

A 40-bit randomly seeded counter accommodates ~550 billion IDs per millisecond on average. Although at a 2^16 / 2^40 = 0.000006% chance the 40-bit randomly seeded counter fails to provide a room for 65,536 IDs, an application can just wait up to one millisecond for the next tick. If the 0.000006% is not acceptable, an application can initialize the 40-bit counter with a 39-bit random number. This approach claims one bit, but it guarantees the space for ~550 billion IDs. Or you can use a 17-bit counter and 16-bit random numbers if you like 65,536 or care about unpredictability, though v4 should be used if unpredictability is the priority, as clearly stated in the Security Considerations section of the current pre-draft.

Zero-starting counters never win and thus should not be recommended explicitly to prevent readers from falling in this common pitfall.

@sergeyprokhorenko, please think of general readers of the RFC, not only of your specific applications. I'd appreciate your views based on your work on the high-load applications that generate no more or less than 65,536 records per millisecond on a single node (i.e. 16-bit zero-starting counters make sense only under this specific scenario; by the way, such an application claims 1 GB/second or 82.4 TB/day storage or one dedicated 1000BASE-T just to store/transfer 128-bit IDs), but few readers work on such a huge constant application or few general-purpose library authors design their work to support such a use case.

@peterbourgon, some sensible recommendations will definitely be helpful for readers, and it's also valuable to keep discussing over and over to refine the recommendations, isn't it?

kyzer-davis commented 2 years ago

@LiosK,

I updated counters section once more in #85 to drop method 2 and re-organize the text for readability purposes. Let me know what you think and let's discuss here.

sergeyprokhorenko commented 2 years ago

@LiosK,

please think of general readers of the RFC, not only of your specific applications.

I think that general readers of the RFC are highly qualified experts in DBMS and distributed systems, not developers of primitive websites.

I think about important applications, such as payment services, financial services, trading systems, marketplaces, online advertising, information retrieval, booking, logistics services etc. These are not my specific applications.

...such an application claims 1 GB/second...

I care about the peak performance of ordinary information systems, about the speed of searching for complexly structured data, and not about the supercomputers that you imagine.

uuid6 / uuid6-ietf-draft