knizhnik / imcs

In-Memory Columnar Store extension for PostgreSQL
Apache License 2.0
203 stars 33 forks source link

Column results are re-used when shifting? Shifting results not same as vanilla PGSQL #57

Closed pavitrakumar78 closed 4 years ago

pavitrakumar78 commented 4 years ago

Hi,

I've run into another issue. I'm not quite sure what is even happening here - but it looks like the result columns of a single query are being re-used somehow in the same query.

Here's a full set of SQL commands to reproduce this:

Create dummy tables, insert sample data, and load the table:

CREATE TABLE sample_table (
    date timestamp not null default CURRENT_TIMESTAMP,
    close int4,
    open int4,
    high int4
);

INSERT INTO sample_table (close, open, high) values (1, 7, 25),(2, 23, 3),(3, 22, 12),(4, 25, 30),(5, 10, 5),(6, 15, 22),(7, 4, 19),(8, 30, 20),
(9, 16, 14),(10, 21, 23),(11, 19, 6),(12, 26, 27),(13, 18, 18),(14, 27, 29),(15, 6, 17),(16, 13, 2),(17, 29, 8),(18, 17, 10),(19, 9, 11),
(20, 11, 1),(21, 8, 24),(22, 2, 9),(23, 24, 7),(24, 12, 16),(25, 3, 15),(26, 1, 26),(27, 20, 28),(28, 14, 21),(29, 28, 13),(30, 5, 4);

select cs_create('sample_table', 'date');

select sample_table_load();

Testing the shifts:

IMCS version: (Note: I have used project just to improve the readability of the output)

select 
cs_project((high<<1) - (close>>1)),
cs_project((high<<1) - (high>>1)),
cs_project((high<<1) - (high>>1)),
cs_project((high<<1) - (high>>1))
from sample_table_get();

PG12 version:

select
lag(high, -1) over (order by date) - lag(close, 1) over (order by date),
lag(high, -1) over (order by date) - lag(high, 1) over (order by date),
lag(high, -1) over (order by date) - lag(high, 1) over (order by date),
lag(high, -1) over (order by date) - lag(high, 1) over (order by date)
from sample_table;

Both should give the same results, but I'm seeing a reducing row count for each column in the IMCS version and the numbers are also wrong compared to the PG query. Here's the output for IMCS: image Output for PG: image

I'm not sure what is happening here, but the docs don't mention anything about computed columns being re-used in the same query.

pavitrakumar78 commented 4 years ago

Fix already pushed to master! Thank you! Closing this issue!