substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.14k stars 148 forks source link

Behavior of subsecond extract in functions document is inconsistent amongst engines #600

Open westonpace opened 6 months ago

westonpace commented 6 months ago

The extract function pulls out a part of a timestamp. In particular this issue is about these components:

      * MILLISECOND Return number of milliseconds since the last full second.
      * MICROSECOND Return number of microseconds since the last full millisecond.
      * NANOSECOND Return number of nanoseconds since the last full microsecond.
      * SUBSECOND Return number of microseconds since the last full second of the given timestamp.

Consider the query SELECT EXTRACT(MICROSECONDS FROM TIMESTAMP '2016-12-31 13:30:15.100259');

I would expect, according to that definition, that I would get 259. However, I get the following:

Engine Result
Postgres 15100259
DuckDb 15100259
Datafusion Not supported
Pyarrow Compute 259
SQL Server (datepart) 100259
Velox unknown
Spark can only extract seconds, which returns 15.100259

So, we have at least 3 different behaviors. Maybe we should just drop these from the method?

CurtHagenlocher commented 6 months ago

At least with Spark 3.4, it doesn't appear to be possible to extract anything smaller than a second from a timestamp, and extracting seconds gives fractional seconds. That is SELECT EXTRACT(SECONDS FROM TIMESTAMP '2016-12-31 13:30:15.100259') returns 15.100259.