Closed EpsilonPrime closed 6 months ago
This is merely based on the proposed approaches.
Approach 1, User-defined types would be a more dynamic approach which could provide flexibility, though, I believe Approach 2 or Approach 3 would rather be helpful to add the required support to the Substrait spec. Having things user-defined means it could support but it would be better to keep the feature part of the core framework?
And between Approach 2 and Approach 3, seems like Approach 2 would be better.
On the slack there was concern that approach 1 is too difficult. If this is true then maybe we need to revisit our support for user defined types. A "precision-parameterized-timestamp" should be usable everywhere that a "timestamp" is usable (same goes for a "precision-parameterized-timestamp_tz"). The only difference I think is that there is one extra method for extracting nanoseconds. As a result it should be as straightforward as defining the type with INHERITS
. So I think this should be a poster child for user defined types.
All that being said, I support approach 2 as long as it is allowing the precision to be requested
and not changing them to nanosecond precision
Does it make sense to have the precision parameter be optional and defaulting to 6, if backward compatibility is a concern?
Approach 1:
If i'm understanding correctly, we would just create a new type_variation? These, and other existing user defined types, are harder to search for on the site. But this new precision-parameterized-timestamp
seems like it would be useful for people to know about since there are a multiple people interested in it. As Vibatha mentioned, it doesn't seem like it would be part of the core framework.
Approach 2 (allowing precision to be requested):
This would change the timestamp from being a Simple Type
to a Compound Type
since it now needs to be configured with a parameter right? If that's the case, is there a way to set a default value for the parameter? That would also mean we need to update the all existing timestamp input args in the extension functions that use them.
There are two places where we'd be concerned about defaulting the type for a parameterized time. There's the protos and then there's the YAML. Defaulting the value in the protos should be straightforward and would allow for backwards compatibility of old data (newer data run in older clients wouldn't be caught though). The next place to update the type would be in the YAML files. Something like what is happening in the arithmetic_decimal file seems like what we'd need to do. Finally after the change goes in we need to start updating consumers to be aware of the new parameter.
Times would definitely be a compound type going forward. However I don't think we want to have YAMLs going forward that allow an unparameterized type. The mere presence of the parameter is going to clue folks in that they need to change their code.
There are some engines that support nanosecond precision in their timestamps (Spark would be one of these). All of the time and timestamp types in Substrait only support millisecond precision. It might be useful to provide guidance on how to handle this in such engines.
Some potential approaches: