Open spershin opened 2 hours ago
CC: @aditi-pandit @rschlussel
This variation of width_bucket
is defined in the SQL spec and clearly defines the null behavior. However, I cannot find the overload mentioned in this issue in the SQL spec, nor do I see a comparable version in Postgres.
I would vote for solution 2 as a null in bins cannot be interpret in any meaningful ways. Checking if they are nulls or not should not be too expensive as the nulls bits are contiguous in memory (so you can check 64 or 256 of them at same time, making it essentially free).
These return 0: SELECT width_bucket(-1, c0) from (VALUES ARRAY[NULL]) t(c0); SELECT width_bucket(-1, c0) from (VALUES ARRAY[NULL, 1]) t(c0); SELECT width_bucket(-1, c0) from (VALUES ARRAY[1, NULL, 4]) t(c0); SELECT width_bucket(-1, c0) from (VALUES ARRAY[NULL, 1, NULL, 4]) t(c0);
These return 1: SELECT width_bucket(1, c0) from (VALUES ARRAY[NULL]) t(c0);
These return 2: SELECT width_bucket(1, c0) from (VALUES ARRAY[NULL, 1]) t(c0); SELECT width_bucket(1, c0) from (VALUES ARRAY[1, NULL, 4]) t(c0);
These return 3: SELECT width_bucket(1, c0) from (VALUES ARRAY[NULL, 1, NULL, 4]) t(c0);
These fail with "Bin values are not sorted in ascending order": SELECT width_bucket(-1, c0) from (VALUES ARRAY[1, NULL]) t(c0); SELECT width_bucket(-1, c0) from (VALUES ARRAY[1, NULL, 4, NULL]) t(c0); SELECT width_bucket(1, c0) from (VALUES ARRAY[1, NULL]) t(c0); SELECT width_bucket(1, c0) from (VALUES ARRAY[1, NULL, 4, NULL]) t(c0);
And so on. There is no check for null elements in the code and unclear how the function should behave if it encounters one. From strict perspective it seems like the function should fail whenever it finds a null element because "Bin values are not sorted in ascending order". What is interesting is depending on the 1st argument and the array, it is not guaranteed that we stumble on a particular NULL due to the binary search nature of the algorithm.
Trying to understand if we should change the function behavior.
Expected Behavior
Unclear.
Current Behavior
See description.
Possible Solution
Steps to Reproduce
In any environment run the example queries.