uuid6 / uuid6-ietf-draft

Next Generation UUID Formats
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/
187 stars 11 forks source link

Typo in Approximate UUID timestamp calculations #131

Closed sergeyprokhorenko closed 10 months ago

sergeyprokhorenko commented 10 months ago

@kyzer-davis, Is it possible to change "such as dividing a microsecond by 1024 (or some other value) instead of 1000 to obtain a millisecond value" for "such as dividing a number of microseconds by 1024 (or some other value) instead of 1000 to obtain a millisecond value"?

kyzer-davis commented 10 months ago

Would it be better to put: "such as dividing a microsecond (or a number of microseconds) by 1024..." just to cover both.

sergeyprokhorenko commented 10 months ago

Would it be better to put: "such as dividing a microsecond (or a number of microseconds) by 1024..." just to cover both.

I don't think so. After all, if you divide a microsecond by 1000, you get a nanosecond.

kyzer-davis commented 10 months ago

[...] if you divide a microsecond by 1000, you get a nanosecond.

I thought you multiply a Microsecond to get a Nanosecond? Else, the last part of the sentence is also wrong...

But that aside, my point in adding both was you could theoretically convert 1µs or 2µs+ by 1000 or 1024 to get some form of MS/NS. By adding both to the verbiage we cover one or more µs to MS conversion (normally or by odd intervals).

x4m commented 10 months ago

Is it correct to use a number of nanoseconds in UUID v7? (Obviously, that would require +20 bits in timestamp section) If this is valid, I think, wording "such as dividing a microsecond (or a number of microseconds) by 1024..." is most precise IMV.

sergeyprokhorenko commented 10 months ago

Is it correct to use a number of nanoseconds in UUID v7?

It is not correct (see 6.2. Monotonicity and Counters):

Replace Left-Most Random Bits with Increased Clock Precision (Method 3): For UUIDv7, which has millisecond timestamp precision, it is possible to use additional clock precision available on the system to substitute for up to 12 random bits immediately following the timestamp

The accuracy of the timestamp has been limited on purpose after much discussion: https://github.com/uuid6/uuid6-ietf-draft/issues/23#issuecomment-899033429

6.1. Timestamp Considerations:

With UUID version 1 and 6, 100-nanoseconds of precision are present while UUIDv7 features millisecond level of precision by default within the Unix epoch that does not exceed the granularity capable in most modern systems

Also, it's pointless to make the timestamp more accurate (nanoseconds) than the time source (microseconds)

bradleypeabody commented 10 months ago

My 2c here is that we should either leave the text how it is (just because we're very close to getting this thing finally published), or if we change anything my suggestion would be to go back to not having that text about dividing by 1024 at all. The spec already says there are no guarantees about the accuracy of the timestamp, and it's up to the implementation to determine how far off from the original value they want to make it. Dividing by 1024 gives a decidedly "incorrect" value (off by months if not years depending on the timestamp), but since there's no guarantee, it's still technically allowed. But I do not think we should encourage it.

Anyway, I just don't think trying to carefully fiddle with that text to take other edge cases into account is worth holding up the document for.

bradleypeabody commented 10 months ago

@sergeyprokhorenko Just so we understand, what was the intention of the original question - what problem were you trying to solve with this:

Is it possible to change "such as dividing a microsecond by 1024 (or some other value) instead of 1000 to obtain a millisecond value" for "such as dividing a number of microseconds by 1024 (or some other value) instead of 1000 to obtain a millisecond value"?

sergeyprokhorenko commented 10 months ago

@bradleypeabody

You haven't given any convincing arguments. We still have time to correct the error in the text.

The time source gives us the number of microseconds. To get the number of milliseconds for UUIDv7, they need to be divided by 1000. But the approximate division by 1024 takes much less computational resources. There is no point in exact division by 1000. This is a problem that absolutely all implementations face. Therefore, we cannot ignore it, as you suggest.

Why it was proposed to divide some one microsecond by 1024, I cannot explain, because this is an erroneous phrase.

bradleypeabody commented 10 months ago

@sergeyprokhorenko Thanks and understood. Until that last message I didn't get what error in the text was that you're referring to (you suggested a change but didn't clearly say why). But okay, I think I get it now: Your concern is that dividing microseconds by 1024 doesn't yield milliseconds, it yields . Is that correct?

If so, I guess we could change the sequence of that text to make it read more correctly like so. As suggested text to give the whole thing:

Implementations MAY alter the actual timestamp. Some examples include security considerations around providing a real clock value within a UUID, to correct inaccurate clocks, to handle leap seconds, or instead of dividing a number of microseconds by 1000 to obtain a millisecond value, diving by 1024 (or some other value) for performance reasons. This specification makes no requirement or guarantee about how close the clock value needs to be to the actual time.

If we're going to leave in this 1024 text, then that would be my suggestion (assuming I understand the concern correctly.)

That said, I still don't understand how suggesting the user divide by 1024 is helpful at all. The rest of the specification clearly says milliseconds (and changing text above here doesn't change that), so if you're dividing by 1024 you're ignoring the rest of the spec. And, that's okay, it's allowed as mentioned earlier because there are no guarantees on the accuracy of the timestamp. But I still don't see any reason to suggest to implementors to do this 1024 division since it's clearly not what anything else in the document says to do. So I'm not going to die over it, but it really seems like it would be better to just take this 1024 text out entirely. You can still do whatever you want, not having this text doesn't prevent that, but it would prevent the kind of confusion generated as we can see in this discussion.

Everyone who reads this and does the math is going to go "uhm, if we divide by 1024 then it's not milliseconds, why is it telling me to do that?"

sergeyprokhorenko commented 10 months ago

@bradleypeabody

Your new wording is perfect:

Implementations MAY alter the actual timestamp. Some examples include security considerations around providing a real clock value within a UUID, to correct inaccurate clocks, to handle leap seconds, or instead of dividing a number of microseconds by 1000 to obtain a millisecond value, dividing by 1024 (or some other value) for performance reasons. This specification makes no requirement or guarantee about how close the clock value needs to be to the actual time.

I agree with you that the spec should be internally consistent. Therefore, I propose to expand the wording:

UUIDv7 values are created by allocating a Unix timestamp in milliseconds, or time intervals close to milliseconds, in the most significant 48 bits and filling the remaining 74 bits, excluding the required version and variant bits, with random bits for each new UUIDv7 generated to provide uniqueness as per Section 6.8.

It's only important that it's not days or microseconds: https://github.com/uuid6/uuid6-ietf-draft/issues/23#issuecomment-899033429

sergeyprokhorenko commented 10 months ago

diving

It is typo. Should be dividing

kyzer-davis commented 10 months ago

I revised the doc with @bradleypeabody's suggestion: https://github.com/ietf-wg-uuidrev/rfc4122bis/commit/4405609787db1cad1f61cbd47f031a643c01b169

I have not made the second change for "or time intervals close to milliseconds" will let the discussion here play out. PR closes Friday on draft-11 to ensure we make the scheduled telechat date next week.

bradleypeabody commented 10 months ago

@sergeyprokhorenko Thanks for the feedback.

I would like to leave it at this (with the most recent change above that Kyzer did) since we've addressed the original concern. As a general concept, I'd like to just leave it saying "milliseconds" wherever possible (i.e. let's not introduce this other "or time intervals close to milliseconds"). The spec is specifically milliseconds, and think we've sufficiently covered the idea that if you put something else in there that isn't exactly milliseconds then the time will be off if it's parsed by some other UUID parser, but that's a totally acceptable tradeoff for implementations that don't care about timestamp interoperability. IMO we've covered it well enough at this point.