knizhnik / imcs

In-Memory Columnar Store extension for PostgreSQL
Apache License 2.0
203 stars 33 forks source link

Wrong results for nested window function #55

Closed pavitrakumar78 closed 4 years ago

pavitrakumar78 commented 4 years ago

Hi,

I'm having some difficulties understanding window functions. I'll explain using examples below:

My question: I expect the result of query_1 and query_2 should be the same (in terms of no. of rows and the actual values), but it's not. Why?

(Note: the reason I've used cs_project in the below queries is to just make comparison easier due to row-by-row alignment of values)

query_1: (input size: 22 rows)

select cs_project(  cs_window_ema(cs_parse('{4.327939024405396,4.597932328082209,4.836839533776754,
4.747928204520065,4.99430333738718,5.081756181230162,5.09315933602187,5.129407855177533,5.157586519984079,
5.008409274234396,4.918217309339468,4.887232524201522,5.081677692264861,5.271836411490995,
5.635956930411851,5.917635890473321,6.250374267838026,6.338526851040115,6.597808483451615,7.132598269717562,
6.48831564334796,5.684637623903427}', 'float8'), 9)<<(9-1)  ) as ema;

(result: 14 rows)

query_2:

select 
cs_project(
cs_window_ema(
(cs_window_ema(cs_parse('{310.52,312.23,317.05,313.78,316.18,314.38,317.59,314.84,
318.92,321.85,320.79,321.72,324.32,325.01,326.86,322.96,
320.88,323.22,321.17,325.12,323.96,326.52,328.79,330.06,
332.11,334.33,334.57,335.57,332.8,337.44,336.83,336.84,
337.91,338.64,337.23,338.28,339.48,342.92,344.12,347.57,
348.33,350.58,349.31,352.6,357.7,345.39,342.57}', 'float8'), 12)<<(26-1)) - 
(cs_window_ema(cs_parse('{310.52,312.23,317.05,313.78,316.18,314.38,317.59,314.84,
318.92,321.85,320.79,321.72,324.32,325.01,326.86,322.96,
320.88,323.22,321.17,325.12,323.96,326.52,328.79,330.06,
332.11,334.33,334.57,335.57,332.8,337.44,336.83,336.84,
337.91,338.64,337.23,338.28,339.48,342.92,344.12,347.57,
348.33,350.58,349.31,352.6,357.7,345.39,342.57}', 'float8'), 26)<<(26-1))
, 9)<<(9-1)
);

(result: 39 rows)

The array in query_1 is actually just the result of the difference between the 2 ema's as in query_2. i.e, query_1's numbers are from:

select 
cs_project(
(cs_window_ema(cs_parse('{310.52,312.23,317.05,313.78,316.18,314.38,317.59,314.84,
318.92,321.85,320.79,321.72,324.32,325.01,326.86,322.96,
320.88,323.22,321.17,325.12,323.96,326.52,328.79,330.06,
332.11,334.33,334.57,335.57,332.8,337.44,336.83,336.84,
337.91,338.64,337.23,338.28,339.48,342.92,344.12,347.57,
348.33,350.58,349.31,352.6,357.7,345.39,342.57}', 'float8'), 12)<<(26-1)) - 
(cs_window_ema(cs_parse('{310.52,312.23,317.05,313.78,316.18,314.38,317.59,314.84,
318.92,321.85,320.79,321.72,324.32,325.01,326.86,322.96,
320.88,323.22,321.17,325.12,323.96,326.52,328.79,330.06,
332.11,334.33,334.57,335.57,332.8,337.44,336.83,336.84,
337.91,338.64,337.23,338.28,339.48,342.92,344.12,347.57,
348.33,350.58,349.31,352.6,357.7,345.39,342.57}', 'float8'), 26)<<(26-1))
);

(result: 22 rows) (exactly same as input of query_1)

All I've done is, wrap the above query directly inside another cs_window_ema, but that seems to have ignored all the shifts I've done inside and returns 39 rows which are incorrect (supposed to be just 14 rows) and so are the values since we're doing EMA where the values change based on where I start the first window from.

Does nested window function work differently? For example, irrespective of the changes in the shifts (<<) for the inner time series output from cs_window_ema, the final result stays the same at 39 rows - how can this be possible? Shouldn't the no. of rows be 14 (same as query_1) since we are already shifting the inner timeseries by 25?. Do nested window functions ignore inner shifts?

pavitrakumar78 commented 4 years ago

This issue is related to #56 which is a simplified version of this problem. Fix has been applied and issue #56 is solved and so is this issue.