Closed Blizzara closed 5 days ago
The epoch i64 value itself should be enough to indicate the precision. Any additional digits after seconds will be for fractional seconds (and used to determine precision).
// If the precision is 6 or less then this is the microseconds since the UNIX epoch
// If the precision is more than 6 then this is the nanoseconds since the UNIX epoch
Does that make the comment from Weston more clear?
The epoch i64 value itself should be enough to indicate the precision. Any additional digits after seconds will be for fractional seconds (and used to determine precision).
// If the precision is 6 or less then this is the microseconds since the UNIX epoch // If the precision is more than 6 then this is the nanoseconds since the UNIX epoch
Does that make the comment from Weston more clear?
Not really, could you ELI5 how you get the precision from the i64? đŸ˜… ie how do you know which part of it indicates seconds?
Not really, could you ELI5 how you get the precision from the i64? đŸ˜… ie how do you know which part of it indicates seconds?
For example
1720200424 -> Friday, July 5, 2024 5:27:04 PM
1720200424123 -> Friday, July 5, 2024 5:27:04.123 PM (3 - millisecond precision)
1720200424123456 -> Friday, July 5, 2024 5:27:04.123456 PM (6 - microsecond precision)
1720200424123456789 -> Friday, July 5, 2024 5:27:04.123456789 PM (9 - nanosecond precision)
We stuck with either microsecond or nanosecond because that seems to be what most engines are using.
Makes sense. This should make it easier for users to understand. I imagine a few others would have eventually had confusion here as well
Thoughts on whether there should be one message (PrecisionTimestamp) shared across both, or two messages (PrecisionTimestamp and PrecisionTimestampTZ)? The types have separate messages, so maybe using separate messages here as well would be better to keep things aligned? But I dunno if it matters.
Thoughts on whether there should be one message (PrecisionTimestamp) shared across both, or two messages (PrecisionTimestamp and PrecisionTimestampTZ)?
I think one message is fine. E.g. we use i64
for int64
and time
(and the legacy timestamp
). We use bytes
for binary
, fixed_binary
, and uuid
. The reader / writer can always distinguish easily between timestamp
and timestamp_tz
using the message code (34/35)
This is a breaking change so I think we'll need 2 SMC approvals. @EpsilonPrime @vbarua @jacques-n
BREAKING CHANGE: changes the message type for Literal PrecisionTimestamp and PrecisionTimestampTZ
The PrecisionTimestamp and PrecisionTimestampTZ literals were introduced in #594, and there was some discussion about their contents in https://github.com/substrait-io/substrait/pull/594#discussion_r1471844566. In their current form, they only contain a i64 value, which I believe is meant to be a precision-dependent number of fractional seconds since epoch. However, the literals don't contain the precision itself, so it's impossible to interpret a PrecisionTimestamp or PrecisionTimestampTZ literal without other information. This is in contrast to e.g. varchar, whose literal does specify the length, or decimal, whose literal specifies scale and precision. @westonpace curious for your thoughts since you were part of that original discussion - am I missing something or is this a bug?
This PR changes the Literal types for PrecisionTimestamp and PrecisionTimestampTZ to contain a PrecisionTimestamp message instead of a i64. That message then contains the i64 value as well as the i32 type. This is a breaking change, but given in their current form these literals aren't usable (unless I'm missing something), would that be okay?
Currently I used the same message for both PrecisionTimestamp and PrecisionTimestampTZ, but I can also make a copy for PTTZ if that'd be better for forwards-compatibility.