ietf-wg-uuidrev / rfc4122bis

revision to RFC4122
Other
57 stars 11 forks source link

AD Review: BCP 14 - SHOULD #114

Closed kyzer-davis closed 1 year ago

kyzer-davis commented 1 year ago

Murray Feedback:

Several of the "SHOULD"s in Section 6.1 seem questionable to me. For instance, the first one leaves the possibility of an implementation that doesn't use an increasing clock. Why isn't this a MUST? The various "Care SHOULD be taken ..." ones are also curious; why would we allow for this to be a choice? Generally below that, the "SHOULD" instances left me wondering why we leave all of these as implementation choices. Then, in "every UUID generation SHOULD be a random integer of any desired length larger than zero", this allows a length of zero; is that okay? And lastly, I don't know what "Implementations SHOULD weigh the consequences" or "Care SHOULD be taken" mean from an interoperability perspective.


RFC2119

Question to answer for each:


6.1. Timestamp Considerations

  1. Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing.
    • Could upgrade to a MUST, for v1, v6, v7 the starting timestamp is the backbone of the UUID so stating that we always pull from a reliable source makes sense.
  2. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements.
    • I think a SHOULD is fine here as later sections discuss how you can handle that, but if you really don’t care about catching something like NTP changing a clock you can ignore those backwards timestamp moves.
  3. For other levels of precision UUIDv8 SHOULD be utilized.
    • We probably need to put a forward reference to the UUIDv7 (method 3 counter) to drive home the alternative.
  4. Care SHOULD be given to ensure that the proper length is selected for a given timestamp.
    • Can probably be removed entirely if I am being honest. The paragraph section does enough to discuss this point it does not need normative language.
  5. If a system overruns the generator by requesting too many UUIDs within a single system time interval, the UUID service SHOULD either return an error, or stall the UUID generator until the system clock catches up, and MUST NOT return knowingly duplicate values.
    • Can upgrade to a MUST.

6.2. Monotonicity and Counters

  1. Care SHOULD be taken to ensure UUIDs generated in batches are also monotonic.
    • Just use non-normative language something like “Implementations might require UUIDs generated in batches adhere to monotonicity requirements.”
  2. That is, if one thousand UUIDs are generated for the same timestamp, there SHOULD be sufficient logic for organizing the creation order of those one thousand UUIDs.
    • Can upgrade to a MUST.
  3. Batch UUID creation implementations MAY utilize a monotonic counter that SHOULD increment for each UUID created during a given timestamp.
    • Can upgrade to a MUST.
  4. Implementations SHOULD employ the following methods for single-node UUID implementations that require batch UUID creation, or are otherwise concerned about monotonicity with high frequency UUID generation.
    • Can upgrade to a MUST.
  5. A fixed bit-length counter, if present, SHOULD be positioned immediately after the embedded timestamp.
    • Can upgrade to a MUST.
  6. With this method, the rand_a section of UUIDv7 SHOULD be used as fixed-length dedicated counter bits that are incremented by one for every UUID generation.
    • I think a SHOULD is fine, if somebody wants to increase by some random number that is fine.
  7. UUIDv7's rand_b section SHOULD be utilized with this method to handle batch UUID generation during a single timestamp tick.
    • Author’s weighing this method vs the others. Alternatives exist and are defined (method 1 and 2). SHOULD seems fine.
  8. The increment value for every UUID generation SHOULD be a random integer of any desired length larger than zero.
    • Similar to the previous but opposite, if an implementation wants to increment by +1 that is fine too. Downsides outlines in next few sentences with a MAY.
  9. Implementations utilizing the fixed-length counter method SHOULD randomly initialize the counter with each new timestamp tick.
    • Starting at zeros is fine. Re-using the previous counter for a new timestamp tick in theory would also be okay.
  10. However, when the timestamp has not incremented, the counter SHOULD be frozen and incremented via the desired increment logic.
    • Can Upgrade to a MUST.
  11. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits.
    • Fine if it is shorter, just providing guidance.
  12. Care SHOULD also be given to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter.
    • Can remove normative verbiage and change to something like “These counter length of 12 to 42 are selected to ensure the that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter.”
  13. Counter rollovers SHOULD be handled by the application to avoid sorting issues.
    • Can upgrade to a MUST.
  14. The general guidance is that applications that care about absolute monotonicity and sortability SHOULD freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken.
    • SHOULD seem fine. Alternatives exist defined in next sentence.
  15. Implementations SHOULD check if the currently generated UUID is greater than the previously generated UUID
    • Do it if you want, just a recommendation. Downsides discussed.
  16. Applications SHOULD embed sufficient logic to catch these scenarios and correct the problem to ensure that the next UUID generated is greater than the previous, or at least report an appropriate error
    • Do it if you want, just a recommendation. Downsides discussed.

6.3. UUID Generator States

  1. If an implementation does not have any stable store available, then it SHOULD proceed with UUID generation as if this was the first UUID created within a batch.
    • Can downgrade to a MAY?

6.4. Distributed UUID Generation

  1. It should be noted that although this section details two methods for the sake of completeness; implementations SHOULD utilize the pseudo-random Node ID option if additional collision resistance for distributed UUID generation is a requirement.
    • Authors recommending one method vs the others. Alternatives detailed.

6.5. Name-Based UUID Generation

  1. While Appendix A details a few interesting namespaces; implementations SHOULD provide the ability to input a custom namespace.
    • They can already but most don’t because of the old RFC. Just some text trying to nudge implementors of libraries to open it up to more than just what is in the appendix.

6.6. Collision Resistance

  1. Implementations SHOULD weigh the consequences of UUID collisions within their application and when deciding between UUID versions that use entropy (randomness) versus the other components such as those in Section 6.1 and Section 6.2.
    • Can remove normative verbiage and change it to a lowercase “should”

6.8. Unguessability

  1. Implementations SHOULD utilize a cryptographically secure pseudo-random number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique").
    • Some client might not have CSPRNG, regular PRNG can be an alternative. MUST is too harsh.
  2. Care SHOULD be taken to ensure the CSPRNG state is properly reseeded upon state changes, such as process forks, to ensure proper CSPRNG operation.
    • Can upgrade to a MUST and change to “The CSPRNG state MUST be properly reseeded upon state changes, such as process forks, to ensure proper CSPRNG operation.”

6.9. UUIDs That Do Not Identify the Host

  1. Implementations SHOULD obtain a 47-bit cryptographic-quality random number as per Section 6.8 and use it as the low 47 bits of the node ID.
    • Alternative is detailed in the last paragraph of the section so a SHOULD seems fine here.

6.10. Sorting

  1. UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g., database indexes) SHOULD sort as opaque raw bytes, without need for parsing or introspection
    • As we have found, some folks like their introspection. I think this was a MUST but it was laxed to a SHOULD at one point.
  2. UUIDs formats created by this specification SHOULD be lexicographically sortable while in the textual representation.
    • More of a statement than a requirement. Can change to “UUIDs formats created by this specification were created to be lexicographically sortable while in the textual representation.”
  3. UUIDs created by this specification are crafted with big-endian byte order (network byte order) in mind. If little-endian style is required a custom UUID format SHOULD be created using UUIDv8.
    • Can upgrade to a MUST

6.11. Opacity

  1. UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID.
    • Similar to a previous bullet, folks like to examine the bits although usually not required for interoperability.

6.12. DBMS and Database Considerations

  1. Thus, where feasible, UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value.
    • Authors nudging towards the better method, alternative in next paragraph.
bradleypeabody commented 1 year ago

@kyzer-davis I'll take a crack at this. A lot of these should be simple. In some cases non-normative language can be used as you suggest, and this includes some situations where just a general concept is being described and the word "should" is used but then has been capitalized - I don't think we need that. Even the RFC describing "SHOULD" https://www.rfc-editor.org/rfc/rfc2119 also uses "should" in a non-normative way in its introduction, I think we can take a cue from that. In other scenarios the "SHOULD" can just be removed altogether with some simple rewording.

Here's my proposal on it:


6.1. Timestamp Considerations

  1. Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing.

  2. Implementations MUST use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing.

    • Agreed, upgrade to MUST.
  3. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements.

  4. Take care to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements.

    • Suggest using non-normative language, recommendation is not specific enough, just general advice.
  5. For other levels of precision UUIDv8 SHOULD be utilized.

  6. For other levels of precision UUIDv8 is available as an option.

    • Suggest using non-normative language, the spec gives all available UUIDs, the reader can already choose for themselves, I don't see a need to be more specific.
  7. Care SHOULD be given to ensure that the proper length is selected for a given timestamp.

  8. Take care to ensure that the proper length is selected for a given timestamp.

    • Agree with earlier suggestion, non-normative language seems good.
  9. If a system overruns the generator by requesting too many UUIDs within a single system time interval, the UUID service SHOULD either return an error, or stall the UUID generator until the system clock catches up, and MUST NOT return knowingly duplicate values.

  10. If a system overruns the generator by requesting too many UUIDs within a single system time interval, the UUID service MUST NOT return knowingly duplicate values, and instead can either return an error, or stall the UUID generator until the system clock catches up.

    • I think the MUST NOT provides the most important imperative here. I don't like the idea of saying the stall or error is what they must do because what if there's some other solution we haven't thought of? The point is that it shouldn't return duplicates, so I think we just say that.

6.2. Monotonicity and Counters

  1. Care SHOULD be taken to ensure UUIDs generated in batches are also monotonic.

  2. Take care to ensure UUIDs generated in batches are also monotonic.

    • Agree with suggestion, go non-normative
  3. That is, if one thousand UUIDs are generated for the same timestamp, there SHOULD be sufficient logic for organizing the creation order of those one thousand UUIDs.

  4. That is, if one thousand UUIDs are generated for the same timestamp, there should be sufficient logic for organizing the creation order of those one thousand UUIDs.

    • This is not a specific requirement, it's simply giving an example of correct behavior, I think "should" is okay here. Or if someone disagrees let's just reword so it's non-normative and not use "should" or "SHOULD".
  5. Batch UUID creation implementations MAY utilize a monotonic counter that SHOULD increment for each UUID created during a given timestamp.

  6. Batch UUID creation implementations MAY utilize a monotonic counter that increments for each UUID created during a given timestamp.

    • Reword to non-normative
  1. Implementations SHOULD employ the following methods for single-node UUID implementations that require batch UUID creation, or are otherwise concerned about monotonicity with high frequency UUID generation.
    • I think this "SHOULD" has to stay, because we specifically do not want to say that these three methods are the only ways to solve the problem. I think it's important to leave it open ended because if someone wants to do something else, and it produces correct output, and it doesn't impact interoperability, just on principle they should be allowed to do that. However I think if we say SHOULD here, then other more detailed points inside can say MUST, because the entire thing is a recommendation.
  1. A fixed bit-length counter, if present, SHOULD be positioned immediately after the embedded timestamp.

  2. A fixed bit-length counter, if present, MUST be positioned immediately after the embedded timestamp.

    • Agreed per above, becomes MUST
  3. With this method, the rand_a section of UUIDv7 SHOULD be used as fixed-length dedicated counter bits that are incremented by one for every UUID generation.

  4. With this method, the rand_a section of UUIDv7 MUST be used as fixed-length dedicated counter bits that are incremented for every UUID generation.

    • Could also leave as SHOULD but this works too - we say it gets incremented but not by how much.
  1. UUIDv7's rand_b section SHOULD be utilized with this method to handle batch UUID generation during a single timestamp tick.
    • I agree with earlier comment, suggest leave it.
  1. The increment value for every UUID generation SHOULD be a random integer of any desired length larger than zero.

  2. The increment value for every UUID generation is a random integer of any desired length larger than zero.

    • What if we just say "is", since we're describing a method of doing something, I think that works.
  3. Implementations utilizing the fixed-length counter method SHOULD randomly initialize the counter with each new timestamp tick.

  4. Implementations utilizing the fixed-length counter method randomly initialize the counter with each new timestamp tick.

    • Same deal here, we're explaining how to do something, we've already set up the context for it, the normative language doesn't help us here, can just come out.
  5. However, when the timestamp has not incremented, the counter SHOULD be frozen and incremented via the desired increment logic.

  6. However, when the timestamp has not incremented, the counter is frozen and incremented via the desired increment logic.

    • Ditto
  1. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits.
    • Agreed that this is okay.
  1. Care SHOULD also be given to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter.

  2. Take care to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter.

    • how about that
  3. Counter rollovers SHOULD be handled by the application to avoid sorting issues.

  4. Counter rollovers MUST be handled by the application to avoid sorting issues.

    • Agreed, upgrade to a MUST.
  1. The general guidance is that applications that care about absolute monotonicity and sortability SHOULD freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken.

    • Agree with earlier statement: SHOULD seem fine. Alternatives exist defined in next sentence.
  2. Implementations SHOULD check if the currently generated UUID is greater than the previously generated UUID

    • Agree with earlier statement: Do it if you want, just a recommendation. Downsides discussed.
  3. Applications SHOULD embed sufficient logic to catch these scenarios and correct the problem to ensure that the next UUID generated is greater than the previous, or at least report an appropriate error

    • Agree with earlier statement: Do it if you want, just a recommendation. Downsides discussed.

And then everything else from 6.3. UUID Generator States and down in the earlier post I agree with the recommendation given and I think unless someone else expresses concerns, just go for it on those changes.

LiosK commented 1 year ago

Generally, I agree with @bradleypeabody; rewording to non-normative verbiage should work very well in many contexts.

Some comments:

6.1. Timestamp Considerations

  1. Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing.

I think we should keep SHOULD here. MUST conflicts with the following "Altering, Fuzzing, or Smearing" section. Plus, I got a feature request to my prototype to add a feature to create a UUIDv7 with an arbitrary timestamp to migrate old IDs to UUIDv7 based on the created_at database field. I think that's a legitimate use case.

  1. If a system overruns the generator by requesting too many UUIDs within a single system time interval, the UUID service SHOULD either return an error, or stall the UUID generator until the system clock catches up, and MUST NOT return knowingly duplicate values.

I would rather suggest to downgrade MUST NOT to SHOULD NOT here. With MUST, every compliant service on a machine MUST refuse to return a value when the system clock stops or goes backwards, which might cause the entire break down of applications running on that server, even if the absolute monotonicity is not a requirement of such applications. A generator should have an option to reset the state and continue when the system clock seems to be broken.

6.2. Monotonicity and Counters

  1. That is, if one thousand UUIDs are generated for the same timestamp, there SHOULD be sufficient logic for organizing the creation order of those one thousand UUIDs.

SHOULD sounds fine here. MUST sounds like prohibiting the generation of one thousand UUIDs without a monotonic counter. For some applications, it should absolutely be fine to get 1000 unordered UUIDs from a simple, much-easier-to-implement counter-less UUIDv7 generator.

  1. Batch UUID creation implementations MAY utilize a monotonic counter that SHOULD increment for each UUID created during a given timestamp.

MUST might be fine, but also we can simply omit SHOULD here.

  1. With this method, the rand_a section of UUIDv7 SHOULD be used as fixed-length dedicated counter bits that are incremented by one for every UUID generation.

Could be MUST if we would prohibit a counter shorter than 12 bits. Alternatively, we can reword this sentence not to mention rand_a. The RFC already provides guidance of 12 to 42 bits for counters. That should be sufficient.

6.8. Unguessability

  1. Care SHOULD be taken to ensure the CSPRNG state is properly reseeded upon state changes, such as process forks, to ensure proper CSPRNG operation.

MUST is ideal, but note that in some environments it is very difficult or often impossible to reseed CSPRNG upon fork. We should perhaps keep SHOULD.

bradleypeabody commented 1 year ago

@kyzer-davis Here's a PR for this https://github.com/ietf-wg-uuidrev/rfc4122bis/pull/119. @LiosK I think I've addressed your concerns as well.

kyzer-davis commented 1 year ago

Will review tomorrow. Re-submission to IETF does not re-open until 7-22 anyway because of IETF 117.