Closed alancai98 closed 2 years ago
MIN
and MAX
should actually work with any data type, not just numbers and timestamps. However, they currently only work with numbers.
Ah, thanks for finding the duplicate issue.
Adding a reproducible test case for the timestamp case:
PartiQL> SELECT MAX(ts) FROM [`2022-05-09T`, `2022-01-01T`] AS ts;
org.partiql.lang.eval.EvaluationException: Expected number: 2022-05-09
Evaluator Error: at line 1, column 1: Unexpected value type, <UNKNOWN> : <UNKNOWN>
at org.partiql.lang.eval.ThunkFactory.populateErrorContext(Thunk.kt:387)
at org.partiql.lang.eval.LegacyThunkFactory.handleException(Thunk.kt:552)
at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$2.invoke(Thunk.kt:210)
at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$2.invoke(Thunk.kt:129)
at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$4$1.invoke(Thunk.kt:711)
at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$4$1.invoke(Thunk.kt:129)
at org.partiql.lang.eval.LegacyThunkFactory.handleException(Thunk.kt:550)
at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$4.invoke(Thunk.kt:210)
at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$4.invoke(Thunk.kt:129)
at org.partiql.lang.eval.EvaluatingCompiler$compile$1.eval(EvaluatingCompiler.kt:311)
at org.partiql.cli.Repl$executePartiQL$1.invoke(Repl.kt:278)
at org.partiql.cli.Repl$executePartiQL$1.invoke(Repl.kt:116)
at org.partiql.cli.Repl.executeTemplate(Repl.kt:245)
at org.partiql.cli.Repl.executePartiQL(Repl.kt:274)
at org.partiql.cli.Repl.run(Repl.kt:342)
at org.partiql.cli.Main.runRepl(main.kt:178)
at org.partiql.cli.Main.main(main.kt:166)
Caused by: org.partiql.lang.eval.EvaluationException: Expected number: 2022-05-09
Evaluator Error: at line <UNKNOWN>, column <UNKNOWN>: Unexpected value type, <UNKNOWN> : <UNKNOWN>
at org.partiql.lang.eval.ExceptionsKt.err(Exceptions.kt:58)
at org.partiql.lang.eval.ExceptionsKt.errNoContext(Exceptions.kt:54)
at org.partiql.lang.eval.ExprValueExtensionsKt.numberValue(ExprValueExtensions.kt:116)
at org.partiql.lang.eval.EvaluatingCompiler$comparisonAccumulator$1.invoke(EvaluatingCompiler.kt:200)
at org.partiql.lang.eval.EvaluatingCompiler$comparisonAccumulator$1.invoke(EvaluatingCompiler.kt:111)
at org.partiql.lang.eval.EvaluatingCompiler$Accumulator.next(EvaluatingCompiler.kt:191)
at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$2$1.invoke(Thunk.kt:731)
at org.partiql.lang.eval.EvaluatingCompiler$compileSelect$1$2$getQueryThunk$$inlined$thunkEnv$lang$2$1.invoke(Thunk.kt:129)
at org.partiql.lang.eval.LegacyThunkFactory.handleException(Thunk.kt:550)
... 15 more
ERROR!
According to PostgreSQL, MIN
& MAX
work on any numeric, string, or date/time type as arguments type of an expression. I think this is reasonable and we should just follow what PostgreSQL does, since for PartiQL, I think
MIN
& MAX
to work on other types, such as null, list, clob. MIN
& MAX
to work across different types. Actually, the use case for any data type is when MIN
and MAX
are involved with schemaless or open-content scenarios.
In the example below, how should MAX
and MIN
behave when f.a
contains different data types in each row? The only behavior that makes sense AFAICT, is to use the sort order for mixed types supplied by section 12.2 of the PartiQL specification.
SELECT MIN(f.a), MAX(f.a) FROM Foo AS f
Actually, the use case for any data type is when
MIN
andMAX
are involved with schemaless or open-content scenarios.In the example below, how should
MAX
andMIN
behave whenf.a
contains different data types in each row? The only behavior that makes sense AFAICT, is to use the sort order for mixed types supplied by section 12.2 of the PartiQL specification.SELECT MIN(f.a), MAX(f.a) FROM Foo AS f
I still think we should just enable MIN
& MAX
to work on number, string & data/time types. I don't think there is a use case, e.g., users want to find the largest value of a list of values of CLOB
type or LIST
type. If in the future, there is such requirement from users, we can enable MIN
& MAX
to work on other types then.
Let's say we have a list of [null, 1, 2, 3]
. According to section 12.2 of the PartiQL spec, the minimum value of this list should be null
, since null value is always less than a number. However, when users use MIN
on this list, I think they want to ignore null
, and return 1
.
Well, you can omit null and missing in that case. But there other types should be sorted according to 12.2 I'm pretty sure.
Anyway, people don't actually want to use min and max with columns that have different data types, but PartiQL should have reasonable (or way least well defined) behavior if they do, given our support for schemaless data sources.
I think the ordering of different types in ORDER BY clause makes sense, since in PartiQL it is valid for a SELECT clause to return a list of values of different types. However, I think ordering of different types in other expressions really complicates things. If we want MAX
& MIN
to work on different types, do we also want <
& >
operators to work on different types as well? e.g. '1' < 2
returns false. I think the thing we need to do is to require both sides of <
& >
operator to have the same type.
>, >=, <, <=
already use 12.2 ordering IIRC.
>, >=, <, <=
already use 12.2 ordering IIRC.
I got an error:
PartiQL> '1' < 2
|
org.partiql.lang.eval.EvaluationException: Cannot compare values: '1', 2
Evaluator Error: at line 1, column 5: Invalid comparision, <UNKNOWN> : <UNKNOWN>
Anyway, for now let's just enable MAX
& MIN
to work on numbers, strings & timestamps without mixing types. I think this already solves the customer's requirement.
Anyway, for now let's just enable
MAX
&MIN
to work on numbers, strings & timestamps without mixing types. I think this already solves the customer's requirement.
@lziq unblocking the customer for the current use-case by limiting the scope makes sense; but referring to what @dlurton pointed out, we need to discuss to see how we can make sure:
MIN
and MAX
do not have an undefined behavior when dealing with tuples attributes with different data types.
The MIN and MAX aggregation functions currently don't support the Timestamp data type. The ordering should be consistent with the ordering provided by
ORDER BY
.