prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.04k stars 5.37k forks source link

Inconsistent semantics of array_sort #20965

Open mbasmanova opened 1 year ago

mbasmanova commented 1 year ago

array_sort applied to arrays of complex types with nested nulls may or may not fail depending on whether the sorting logic needs to compare the nulls to decide the order. I wonder if this is intentional? If so, would it make sense to document this behavior?

presto> select array_sort(array[array[1, null], array[2, null]]);
         _col0
------------------------
 [[1, null], [2, null]]
(1 row)

presto> select array_sort(array[array[1, null], array[1, 2]]);
Query 20230926_061447_03626_sq8kq failed: Array contains elements not supported for comparison

CC: @duanmeng @kaikalur @tdcmeehan @aditi-pandit

Akanksha-kedia commented 1 year ago

@mbasmanova can i take up this ? help to please assign to me

Akanksha-kedia commented 1 year ago

Sorts and returns the array based on the given comparator function. The comparator will take two nullable arguments representing two nullable elements of the array. It returns -1, 0, or 1 as the first nullable element is less than, equal to, or greater than the second nullable element. If the comparator function returns other values (including NULL), the query will fail and raise an error. We need to clearly mention for nested array with null values as select array_sort(array[array[1, 6], array[2, 9]]); its working

mbasmanova commented 1 year ago

@Akanksha-kedia Thank you for offering to help, but what's you thinking about how to fix this? BTW, notice that there are 2 versions of array_sort function. One takes only array, another takes an array and a lambda function. This issue is about the first version. The second has more problems described in https://velox-lib.io/blog/array-sort

For Velox, we are thinking of maybe adding a check that none of the array elements is null or contains null before sorting.

Akanksha-kedia commented 1 year ago

Sounds good!! I was checking how we can resolve this issue

On Fri, 29 Sep 2023 at 3:42 PM, Maria Basmanova @.***> wrote:

@Akanksha-kedia https://github.com/Akanksha-kedia Thank you for offering to help, but what's you thinking about how to fix this? BTW, notice that there are 2 versions of array_sort function. One takes only array, another takes an array and a lambda function. This issue is about the first version. The second has more problems described in https://velox-lib.io/blog/array-sort

For Velox, we are thinking of maybe adding a check that none of the array elements is null or contains null before sorting.

— Reply to this email directly, view it on GitHub https://github.com/prestodb/presto/issues/20965#issuecomment-1740647232, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVL2AZV37WZOCZ6OOV62AW3X42NKFANCNFSM6AAAAAA5HFIWLA . You are receiving this because you were mentioned.Message ID: @.***>

Akanksha-kedia commented 1 year ago

@Akanksha-kedia Thank you for offering to help, but what's you thinking about how to fix this? BTW, notice that there are 2 versions of array_sort function. One takes only array, another takes an array and a lambda function. This issue is about the first version. The second has more problems described in https://velox-lib.io/blog/array-sort

For Velox, we are thinking of maybe adding a check that none of the array elements is null or contains null before sorting.

good finding !!! But how we are going to proceed for prestodb, something similar to Velox?

tdcmeehan commented 1 year ago

IMO, Postgres semantics (as described in #20996, and how Velox is thinking of handling) seem sensible. Let's add a flag and toggle new behavior which verifies that nested complex data structures don't contain nulls.

mbasmanova commented 11 months ago

@tdcmeehan

Postgres semantics (as described in https://github.com/prestodb/presto/pull/20996,

Tim, the link above brings me to "Template for security Vulnerability Request" which doesn't seem right. Any chance you can share another link?

tdcmeehan commented 11 months ago

@mbasmanova it looks like the PR was edited, but the comments at the time are relevant to array_sort.