partiql / partiql-lang-kotlin

PartiQL libraries and tools in Kotlin.
https://partiql.org/
Apache License 2.0
538 stars 60 forks source link

MIN and MAX support for Timestamp data type #600

Closed alancai98 closed 2 years ago

alancai98 commented 2 years ago

The MIN and MAX aggregation functions currently don't support the Timestamp data type. The ordering should be consistent with the ordering provided by ORDER BY.

dlurton commented 2 years ago

Duplicates https://github.com/partiql/partiql-lang-kotlin/issues/203

dlurton commented 2 years ago

MIN and MAX should actually work with any data type, not just numbers and timestamps. However, they currently only work with numbers.

alancai98 commented 2 years ago

Ah, thanks for finding the duplicate issue.

Adding a reproducible test case for the timestamp case:

PartiQL> SELECT MAX(ts) FROM [`2022-05-09T`, `2022-01-01T`] AS ts;
org.partiql.lang.eval.EvaluationException: Expected number: 2022-05-09
    Evaluator Error: at line 1, column 1: Unexpected value type, <UNKNOWN> : <UNKNOWN>

    at org.partiql.lang.eval.ThunkFactory.populateErrorContext(Thunk.kt:387)
    at org.partiql.lang.eval.LegacyThunkFactory.handleException(Thunk.kt:552)
    at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$2.invoke(Thunk.kt:210)
    at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$2.invoke(Thunk.kt:129)
    at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$4$1.invoke(Thunk.kt:711)
    at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$4$1.invoke(Thunk.kt:129)
    at org.partiql.lang.eval.LegacyThunkFactory.handleException(Thunk.kt:550)
    at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$4.invoke(Thunk.kt:210)
    at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$4.invoke(Thunk.kt:129)
    at org.partiql.lang.eval.EvaluatingCompiler$compile$1.eval(EvaluatingCompiler.kt:311)
    at org.partiql.cli.Repl$executePartiQL$1.invoke(Repl.kt:278)
    at org.partiql.cli.Repl$executePartiQL$1.invoke(Repl.kt:116)
    at org.partiql.cli.Repl.executeTemplate(Repl.kt:245)
    at org.partiql.cli.Repl.executePartiQL(Repl.kt:274)
    at org.partiql.cli.Repl.run(Repl.kt:342)
    at org.partiql.cli.Main.runRepl(main.kt:178)
    at org.partiql.cli.Main.main(main.kt:166)
Caused by: org.partiql.lang.eval.EvaluationException: Expected number: 2022-05-09
    Evaluator Error: at line <UNKNOWN>, column <UNKNOWN>: Unexpected value type, <UNKNOWN> : <UNKNOWN>

    at org.partiql.lang.eval.ExceptionsKt.err(Exceptions.kt:58)
    at org.partiql.lang.eval.ExceptionsKt.errNoContext(Exceptions.kt:54)
    at org.partiql.lang.eval.ExprValueExtensionsKt.numberValue(ExprValueExtensions.kt:116)
    at org.partiql.lang.eval.EvaluatingCompiler$comparisonAccumulator$1.invoke(EvaluatingCompiler.kt:200)
    at org.partiql.lang.eval.EvaluatingCompiler$comparisonAccumulator$1.invoke(EvaluatingCompiler.kt:111)
    at org.partiql.lang.eval.EvaluatingCompiler$Accumulator.next(EvaluatingCompiler.kt:191)
    at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$2$1.invoke(Thunk.kt:731)
    at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$2$1.invoke(Thunk.kt:129)
    at org.partiql.lang.eval.LegacyThunkFactory.handleException(Thunk.kt:550)
    ... 15 more
ERROR!
lziq commented 2 years ago

According to PostgreSQL, MIN & MAX work on any numeric, string, or date/time type as arguments type of an expression. I think this is reasonable and we should just follow what PostgreSQL does, since for PartiQL, I think

  1. There is no use case to enable MIN & MAX to work on other types, such as null, list, clob.
  2. There is no use case to enable MIN & MAX to work across different types.
dlurton commented 2 years ago

Actually, the use case for any data type is when MIN and MAX are involved with schemaless or open-content scenarios.

In the example below, how should MAX and MIN behave when f.a contains different data types in each row? The only behavior that makes sense AFAICT, is to use the sort order for mixed types supplied by section 12.2 of the PartiQL specification.

SELECT MIN(f.a), MAX(f.a) FROM Foo AS f
lziq commented 2 years ago

Actually, the use case for any data type is when MIN and MAX are involved with schemaless or open-content scenarios.

In the example below, how should MAX and MIN behave when f.a contains different data types in each row? The only behavior that makes sense AFAICT, is to use the sort order for mixed types supplied by section 12.2 of the PartiQL specification.

SELECT MIN(f.a), MAX(f.a) FROM Foo AS f

I still think we should just enable MIN & MAX to work on number, string & data/time types. I don't think there is a use case, e.g., users want to find the largest value of a list of values of CLOB type or LIST type. If in the future, there is such requirement from users, we can enable MIN & MAX to work on other types then.

lziq commented 2 years ago

Let's say we have a list of [null, 1, 2, 3]. According to section 12.2 of the PartiQL spec, the minimum value of this list should be null, since null value is always less than a number. However, when users use MIN on this list, I think they want to ignore null, and return 1.

dlurton commented 2 years ago

Well, you can omit null and missing in that case. But there other types should be sorted according to 12.2 I'm pretty sure.

dlurton commented 2 years ago

Anyway, people don't actually want to use min and max with columns that have different data types, but PartiQL should have reasonable (or way least well defined) behavior if they do, given our support for schemaless data sources.

lziq commented 2 years ago

I think the ordering of different types in ORDER BY clause makes sense, since in PartiQL it is valid for a SELECT clause to return a list of values of different types. However, I think ordering of different types in other expressions really complicates things. If we want MAX & MIN to work on different types, do we also want < & > operators to work on different types as well? e.g. '1' < 2 returns false. I think the thing we need to do is to require both sides of < & > operator to have the same type.

dlurton commented 2 years ago

>, >=, <, <= already use 12.2 ordering IIRC.

lziq commented 2 years ago

>, >=, <, <= already use 12.2 ordering IIRC.

I got an error:

PartiQL> '1' < 2
   | 
org.partiql.lang.eval.EvaluationException: Cannot compare values: '1', 2
    Evaluator Error: at line 1, column 5: Invalid comparision, <UNKNOWN> : <UNKNOWN>
lziq commented 2 years ago

Anyway, for now let's just enable MAX & MIN to work on numbers, strings & timestamps without mixing types. I think this already solves the customer's requirement.

am357 commented 2 years ago

Anyway, for now let's just enable MAX & MIN to work on numbers, strings & timestamps without mixing types. I think this already solves the customer's requirement.

@lziq unblocking the customer for the current use-case by limiting the scope makes sense; but referring to what @dlurton pointed out, we need to discuss to see how we can make sure: