Open DanielHunte opened 3 months ago
It is not very clear, honestly, what and how we should fix here.
Presto throws only when there are nested types with nulls:
select array_position(c0, c1) from (values (array[array[1, null]], array[1, null])) t(c0, c1);
while Velox returns 1.
Queries like
SELECT array_position(c0, 5) from (values (ARRAY[ 4, 5, 6])) t(c0);
SELECT array_position(c0, 5) from (values (ARRAY[ 4, 5, NULL])) t(c0);
SELECT array_position(c0, 5) from (values (ARRAY[ NULL, 5, 6])) t(c0);
SELECT array_position(c0, NULL) from (values (ARRAY[ NULL, 5, 6])) t(c0);
work fine.
In the 1st case, is array[1, null]
equal to array[1, null]
?
According to SQL standard NULL
is "not equal" to another NULL
.
So, technically, should Velox return NULL for the 1st query?
Take
SELECT array_position(c0, 5) from (values (ARRAY[NULL, 5, 6])) t(c0);
Should we return NULL here too, because the 1st element NULL might be 5 or might not be 5? Undefined.
Need some input here from multiple folks.
Throwing is kind of not nice, but then again, could be all right.
Since we already defined NaN == NaN despite IEEE standard said no, I think NULL == NULL is still an option on the table despite SQL standard said no.
From the function above:
SELECT array_position(c0, 5) from (values (ARRAY[NULL, 5, 6])) t(c0); // => 2
it seems like the semantic is that you should consider matches even if some comparisons return null (5 == NULL).
Then for the initial case, no comparisons will match (all will return NULL), since "array[1, null] == array[1, null]" is NULL;
SELECT array_position(c0, c1) from (values (array[array[1, null]], array[1, null])) t(c0, c1);
According to the docs, array_position() should return 0 if there are no matches. For example:
SELECT array_position(c0, 5) from (values (ARRAY[NULL, NULL, NULL])) t(c0); // => 0
From this logic, it seems like the function above should return 0 as well.
Curious to what @rschlussel @tdcmeehan and others who know this codebase better think though.
I think we should probably return null for nested nulls, but we could also potentially change the behavior to use distinct semantics (null == null). Whatever choice should be the same between nulls and nested nulls. Current behavior of throwing an exception for any nested nulls in the input isn't great.
The Presto function array_position throws and exception when there are any nulls present in the array. Velox handles nulls and produces a valid output.
This is related to #10580
Your Environment
Expected Behavior
Presto should handle nulls and produces a valid output.
Current Behavior
An exception is thrown when nulls are present.
Possible Solution
Steps to Reproduce
Query: select array_position(c0, c1) from (values (array[array[1, null]], array[1, null])) t(c0, c1);
Screenshots (if appropriate)
Context
Presto and Velox Correctness