Open bjovanovic84 opened 4 years ago
Definitely something we are also looking forward too. We use percentile for so many of our statistics and charts. This is the only thing that is stopping us from using Rockset.
You can emulate this functionality with the NTH_VALUE window function.
This worked for me:
SELECT
v.attributeId,
MIN(v.value) "min",
ELEMENT_AT(ARRAY_SORT(ARRAY_AGG(v.value)), LENGTH(ARRAY_AGG(v.value))*0.25) twenty_fifth,
ELEMENT_AT(ARRAY_SORT(ARRAY_AGG(v.value)), LENGTH(ARRAY_AGG(v.value))*0.5) median,
ELEMENT_AT(ARRAY_SORT(ARRAY_AGG(v.value)), LENGTH(ARRAY_AGG(v.value))*0.75) seventy_fifth,
ELEMENT_AT(ARRAY_SORT(ARRAY_AGG(v.value)), LENGTH(ARRAY_AGG(v.value))*0.95) ninety_fifth,
ELEMENT_AT(ARRAY_SORT(ARRAY_AGG(v.value)), LENGTH(ARRAY_AGG(v.value))*0.99) ninety_ninth,
MAX(v.value) "max"
FROM
commons.attributeValues v
group by v.attributeId
@benhannel can you see any issue with that approach? Would there be any way to avoid the repetition of:
ARRAY_AGG(v.value)
throughout the query?
Yes, the approach you have there will work. You can also use a subquery to avoid some duplication of ARRAY_SORT(ARRAY_AGG(v.value))
. There are also some window functions which could be useful to you:
https://rockset.com/docs/window-functions/#ntile
It's worth noting that any mechanism to compute an exact percentile value will take memory which is linear in the number of input rows. Max and standard deviation are much cheaper to compute, if sufficient for your use case.
Is your feature request related to a problem? Please describe. It would be very useful to add a percentile aggregation function, in addition to the standard min, max, avg. When performing analysis it's often a requirement to report p95, p99 values, it's not easy to do so right now.
Describe the solution you'd like Adding something like
[MIN|MAX]_PCT(<percentile>, <column>)
would be ideal, to be used likeMAX_PCT(99, value)
,MIN_PCT(20, value)
etc.