Closed qntm closed 3 years ago
In fact I think models (1) and (2) can be subdivided.
INSERT_OVERRUN_ARRAY
INSERT_OVERRUN_LAST
INSERT_OVERRUN_ARRAY
but return the last result from the array. If the input Unix instant was removed, an exception is thrown.INSERT_STALL_RANGE
INSERT_STALL_LAST
INSERT_STALL_RANGE
but we only return the last TAI instant in the range. This is the same as INSERT_OVERRUN_LAST
.INSERT_THROW
INSERT_OVERRUN_LAST
or INSERT_STALL_LAST
.INSERT_SMEAR
Hi qntm. I'm not an expert on this but I love being overly pedantic so I thought I'd comment.
It's not exactly clear to me what precisely you mean by "Unix time" in this, well, entire project. Bear with me here; it might make a difference. Possibly, you mean POSIX's concept of "seconds since the epoch", which is defined in https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_16. However, I don't think that's even defined with respect to a smaller granularity than a second! Also, a lot seems to be left "unspecified" or "implementation-defined" here. As you say, the rationale, https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap04.html#tag_21_04_16, is difficult to understand.
However, in the readme of the project, you seem to treat JavaScript's Date
as a definitive source of "Unix time". (This, I assume, is especially relevant because the project is written in JavaScript). This may mean the ECMAScript standard holds sway here. Luckily, that standard seems a little more forthcoming. The ECMAScript® 2020 Language Specification says this about time, in https://262.ecma-international.org/11.0/#sec-date-objects:
Time measurement in ECMAScript is analogous to time measurement in POSIX, in particular sharing definition in terms of the proleptic Gregorian calendar, an epoch of midnight at the beginning of 01 January, 1970 UTC, and an accounting of every day as comprising exactly 86,400 seconds (each of which is 1000 milliseconds long).
An ECMAScript time value is a Number, either a finite integer representing an instant in time to millisecond precision or NaN representing no specific instant. A time value that is a multiple of 24 × 60 × 60 × 1000 = 86,400,000 (i.e., is equal to 86,400,000 × d for some integer d) represents the instant at the start of the UTC day that follows the epoch by d whole UTC days (preceding the epoch for negative d). Every other finite time value t is defined relative to the greatest preceding time value s that is such a multiple, and represents the instant that occurs within the same UTC day as s but follows it by t − s milliseconds.
Time values do not account for UTC leap seconds—there are no time values representing instants within positive leap seconds, and there are time values representing instants removed from the UTC timeline by negative leap seconds. However, the definition of time values nonetheless yields piecewise alignment with UTC, with discontinuities only at leap second boundaries and zero difference outside of leap seconds.
It seems to me like "there are no time values representing instants within positive leap seconds", how one calculates multiples, and "discontinuities only at leap second boundaries and zero difference outside of leap seconds", suggest (3) as the only correct answer under the ECMAScript standard. (Though, possibly, you should return NaN, "representing no specific instant", as does new Date('2016-12-31 23:59:60').getTime()
*, instead of throwing an exception.)
Whether implementations respect this standard is left as an exercise for the reader.
...Of course, if "Unix time" is supposed to mean "what time might a unix system tell us it is?" then you should probably implement all the options you've described here because there's probably a unix system that does it for each of those ways.
*see https://searchfox.org/mozilla-central/rev/d7e344e956d9da2808ea33e1fe0f963ed10c142d/js/src/jsdate.cpp#992 for the validation code that rejects this example in firefox, if you're curious. It is not very illuminating.
Really interesting stuff, thank you! (Apologies for the delay, I have other projects going on in parallel.) It's good that the ECMAScript standard is more precise about these matters. For the sake of consumers, though, I think having access to various models is the only way to go. I think returning NaN
in the error cases also makes a lot of sense.
Actually implementing this stuff is proving thorny, but stay tuned...
There are at least four models that I am aware of for modelling the relationship between Unix milliseconds and TAI milliseconds during an inserted leap second:
All of these relationships have different advantages and disadvantages. In models (1), (2) and (3), the relationship is not strictly monotonic. In model (1), Unix and TAI are in a one-to-many relationship, which is reflected in
t-a-i
'stai.oneToMany
APIs, which return arrays. In model (2), Unix time truly "ignores" leap seconds but there is still technically a one-to-many relationship here - one Unix time maps to a whole range of TAI times.tai.oneToOne
provides access to only the last Unix time in the range. In model (3) basic time conversions during inserted time cause exceptions to be thrown.tai.oneToOne
used to behave like this. And in model (4), the relationship is strictly monotonic but Unix seconds vary in length, as they used to do in the bad old days pre-1972 - worse than that, the variation in length isn't even a nice round number. Pre-1972, a Unix millisecond was precisely 13 or 15 TAI picoseconds longer than a TAI millisecond, andt-a-i
is designed to take advantage of this precision when performing its internal calculations. With smearing, that's more like 11,574.074074... TAI picoseconds longer. This fraction can't be modelled exactly and will lead to imprecise results without an overhaul oft-a-i
's internals and the sacrifice of some guarantees in its output.None of these models are authoritative and there appears to be no agreement on which is correct, even in POSIX itself, which as far as I can tell ties itself in knots and directly contradicts itself.
So, what I want to do is this:
t-a-i
should provide access to all four models according to user preference, not just (1) and (2) as it does currently.t-a-i
should name the models something sensible instead of the rather unclearoneToMany
andoneToOne
models.t-a-i
's API may need to change somewhat to reflect the fact that model (4) does not allow for precise results. The preciseunixToAtomicPicos
methods, for example, may just have to be removed entirely, or replaced with something which returns aBigInt
numerator and denominator.