Closed pavitrakumar78 closed 4 years ago
To implement the following SQL query in IMCS:
select distinct(Symbol)
from(
select Symbol, Day, Close,
avg(Close) over (partition by Symbol order by Day rows 14-1 preceding ) as sma14
from Quote
where Day between '2020-07-05' and '2020-08-05'
) tb1
where sma14 > close_price;
you should run the following query:
select Symbol from (select Symbol, (select cs_count(cs_filter_pos(cs_window_avg(Close, 14)>Close) cnt from Quote_get(Symbol, '2020-07-05'::date, '2020-08-05' ::date)) s1 from SecurityQuotes) s2 where cnt>0;
select Symbol from (select Symbol, (select cs_count(cs_filter_pos(cs_window_avg(Close, 14)>Close) cnt from Quote_get(Symbol, '2020-07-05'::date, '2020-08-05' ::date)) s1 from SecurityQuotes) s2 where cnt>0;
There's a bracket missing - I think it's supposed to come before cnt
?
select
Symbol
from
(
select
Symbol,
(
select
cs_count(cs_filter_pos(cs_window_avg(Close, 14) > Close)) cnt
from
Quote_get(Symbol, '2020-07-05'::date, '2020-08-05' ::date)
)
s1
from
SecurityQuotes
)
s2
where
cnt > 0;
With this, I get error: ERROR: column "cnt" does not exist
Sorry,
select Symbol from (select Symbol, (select cs_count(cs_filter_pos(cs_window_avg(Close, 14)>Close) from Quote_get(Symbol, '2020-07-05'::date, '2020-08-05' ::date)) cnt from SecurityQuotes) s2 where cnt>0;
select Symbol from (select Symbol, (select cs_count(cs_filter_pos(cs_window_avg(Close, 14)>Close) from Quote_get(Symbol, '2020-07-05'::date, '2020-08-05' ::date)) cnt from SecurityQuotes) s2 where cnt>0;
Thank you for the correction. A bracket is still missing, but this is the final query, correct?
select
Symbol
from
(
select
Symbol,
(
select
cs_count(cs_filter_pos(cs_window_avg(Close, 14) > Close)**)** <-- you last query was missing this bracket
from
Quote_get(Symbol, '2020-07-05'::date, '2020-08-05' ::date)
)
cnt
from
SecurityQuotes
)
s2
where
cnt > 0;
I tried this but got segfault in the log and the db connection was closed unexpectedly. Here's the log:
2020-09-10 16:39:51.396 UTC [144610] LOG: server process (PID 183472) was terminated by signal 11: Segmentation fault
2020-09-10 16:39:51.396 UTC [144610] DETAIL: Failed process was running: select Symbol from (select Symbol, (select cs_count(cs_
filter_pos(cs_window_avg(Close, 14)>Close)) from Quote_get(Symbol, '2020-07-05'::date, '2020-08-05' ::date))
cnt from SecuritiesQuotes)
s2 where cnt>0;
2020-09-10 16:39:51.396 UTC [144610] LOG: terminating any other active server processes
2020-09-10 16:39:51.397 UTC [169204] WARNING: terminating connection because of crash of another server process
2020-09-10 16:39:51.397 UTC [169204] DETAIL: The postmaster has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-09-10 16:39:51.397 UTC [169204] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-09-10 16:39:51.397 UTC [227781] postgres@stock_db FATAL: the database system is in recovery mode
2020-09-10 16:39:51.399 UTC [144610] LOG: all server processes terminated; reinitializing
2020-09-10 16:39:51.782 UTC [227782] LOG: database system was interrupted; last known up at 2020-09-10 16:03:53 UTC
2020-09-10 16:39:51.890 UTC [227782] LOG: database system was not properly shut down; automatic recovery in progress
2020-09-10 16:39:51.892 UTC [227782] LOG: redo starts at 5/6F70AA40
2020-09-10 16:39:51.892 UTC [227782] LOG: invalid record length at 5/6F70AB28: wanted 24, got 0
2020-09-10 16:39:51.892 UTC [227782] LOG: redo done at 5/6F70AAF0
2020-09-10 16:39:51.906 UTC [144610] LOG: database system is ready to accept connections
Info about the table:
Same schema as Quote
table in the introduction docs here.
~ 55 mil rows/ ~19,000 stocks
SELECT pg_size_pretty (pg_relation_size('Quote')) as tbl_size, pg_size_pretty (pg_indexes_size('Quote')) idx_size;
returns:
4978 MB for table and 7176 MB for the index ( index on date, id, (date, id) )
System config: 8x 2.0 GHz dedicated CPU instance - 16 GB RAM, 320 GB SSD No other processes were running when this query was run.
I'm able to execute the queries mentioned in the intro page in secs/msecs without any issues. For example,
select cs_count(cs_filter_pos(cs_window_avg(Close, 14)>Close)) from Quote_get('AAPL.US', '2020-07-05'::date, '2020-08-05' ::date);
Takes <0.5ms for the first run (measured using \timing
).
The VWAP calculation for all the stocks (as mentioned in the intro) after creating the view takes ~5 secs to finish computing in this instance, but I'm not sure why the query for SMA>open for all stocks does not work. Is there anything else I can look at to figure out where the issue originates from? I should be able to load this entire table into memory with 16GB right?
Edit:
Here's what I get from gdb:
Thread 1 "postgres" received signal SIGSEGV, Segmentation fault.
0x00007f68418a37b2 in imcs_subseq_page.constprop () from /usr/lib/postgresql/12/lib/imcs.so
I've also attached my postgres.conf
file with this post. I've not done many changes - the important changes are shared_buffers = 2GB
, work_mem = 20MB
and maintenance_work_mem = 1GB
.
postgresql.txt
Sorry, there was bug in handling empty subsequences, fixed.
I do not know which data you have imported in Quotes table, but NYSE_2003_2013 as it is clear from its name s contains dates from 2003 till 2013.
I do not know which data you have imported in Quotes table, but NYSE_2003_2013 as it is clear from its name s contains dates from 2003 till 2013.
Thank you for the fix.
I'm using different data set to test the queries. It's much bigger than NYSE dataset and has data from 1990s to 2020. I'm not near my work PC now - I'll report back in a few hrs after updating my local installation.
Is there a proper way to uninstall (completely remove) the extension fully and then install it fresh?
drop extension imcs + server restart should do it
Sorry, there was bug in handling empty subsequences, fixed.
I can confirm it works now! Thanks!
I have one more question: Some of my queries involve working with multiple indicators with LAG
functions and with different periods. I don't need to explicitly handle the alignment of rows in PGSQL because the UDFs I've created return null for the first n
values where n
is period and functions like LAG
also do the same. Also, arithmetic operations on these columns also don't require any special handling because any op with null
is just null
and thus the row sizes stay the same. But in IMCS, since we are dealing with arrays, this becomes a bit difficult to handle:
for example, in PG12, I can do:
select x,
lag(x,3) over (order by x),
(x-lag(x,3) over (order by x)) as result
from generate_series(0,10) as g(x);
which gives all 3
as result for the result column which is after exactly 3 null values i.e, doing computation with columns that do not have shift with ones that have null values is easy since I don't need to perform shift twice to align the column values. But in IMCS, I need to do the following to get the same result:
select (cs_parse('{0,1,2,3,4,5,6,7,8,9,10}', 'int4') << 3) - (cs_parse('{0,1,2,3,4,5,6,7,8,9,10}', 'int4') >> 3);
It's simple for this case, but when you have nested indicators and do logical operations with them, there's a lot of room for error since I can't fall back on how the engine handles columns with null values as in native PG.
Is there a way to simplify this in IMCS or do I just have to deal with handling each array with the specific cs_cut_head
or cs_cut_tail
function?
Sorry, can you explain which IMCS query you want to perform I what is the expected result? Do you want something like this:
select (cs_parse('{0,1,2,3,4,5,6,7,8,9,10}', 'int4') << 3) - cs_parse('{0,1,2,3,4,5,6,7,8,9,10}', 'int4');
Sorry, can you explain which IMCS query you want to perform I what is the expected result? Do you want something like this:
select (cs_parse('{0,1,2,3,4,5,6,7,8,9,10}', 'int4') << 3) - cs_parse('{0,1,2,3,4,5,6,7,8,9,10}', 'int4');
I want to be able to use computed columns with lag along with computed columns without any lag/null values without having to deal with aligning the array as it becomes cumbersome when dealing with many computed columns with lags and shifts.
For example, executing
select (cs_parse('{0,1,2,3,4,5,6,7,8,9,10}', 'int4')) - (cs_parse('{0,1,2,3,4,5,6,7,8,9,10}', 'int4') >> 3);
would be the direct equivalent of PGSQL's query below in terms of syntax.
select x-lag(x,3) over (order by x) as result
from generate_series(0,10) as g(x);
but the IMCS query gives the wrong result compared to PGSQL's because the actual operation done in IMCS is :
{0,1,2,3,4,5,6,7,8,9,10} (8,9,10 is ignored due to length mismatch)
{0,1,2,3,4,5,6,7} -
= {0,0,0,0,0,0,0} (wrong compared to PGSQL's equivalent)
It's because the array is 'aligned' from left to right.
But in PGSQL, what happens after the shift is: {0,1,2,3,4,5,6,7,8,9,10} {X,X,X,0,1,2,3,4,5,6,7} -
= {3,3,3,3,3,3,3}
So to emulate the same behavior in IMCS, I have to carefully use shifts everywhere to make sure all the arrays are aligned else, I get the wrong result for a similar query to PGSQL. My question: Is there a simpler way of doing away with handling extra shifts in IMCS? Is it possible to use the same syntax above (only 1 shift as in PGSQL) to get the result as PGSQL? i.e, an easier way such that IMCS automatically handles shifts by appending NULLs at the beginning of the array and behaves like PGSQL.
Sorry for the double comment, but as another example, see below on how I calculate ATR in plain PGSQL:
select Symbol, Date,
avg(true_range, 14) over (partition by Symbol order by Date rows 14-1 preceding) as atr14
from(
select *,
GREATEST(
High - lag(Close, 1) over (partition by Symbol order by Date),
lag(Close, 1) over (partition by Symbol order by Date) - Low,
High - Low
) as true_range
from Quotes
where Symbol = 'ABB' and Date >= '2010-01-01' and Date <= '2010-01-30'
) tb1
Looks simple and I've verified that the results are the same as the indicator values on ThinkOrSwim trading client (ATR using SMA).
But in IMCS, here's what I have to do to get the same result:
select Date, cs_window_avg(
cs_maxof(cs_maxof((High<<1) - (Close>>1), (Close>>1) - (Low<<1)), (High) - (Low)), 14)<<(14-1) as atr
from Quote_get('ABB', date('01-Jan-2010'), date('30-Jan-2010'));
I'm not using the built-in ATR to just show an example of how many shifts I would need to write to get the same result as the PGSQL one. Both these queries give the same output for me. But in the IMCS query, I have to write plenty of shifts to align the arrays in the right positions which I don't need to do in PGSQL. This approach of adjusting/shifting arrays becomes more error-prone as we include more indicators and columns with lags in the query.
It would be very useful and much easier to use IMCS for building complex indicators with shifts if the array operations automatically took care of prepending NULLs wherever necessary to keep the size constant. I hope this gives you a better idea of my request/question.
Sorry, in case of standard SQL result of select x-lag(x,3) over (order by x) as result from generate_series(0,10) as g(x); is {null,null,null,3,3,3,3,3,3,3,3} But IMCS doesn't support NULL values! This is why it is necessary to "align" arrays.
int4:{3,3,3,3,3,3,3,3} (1 row)
If you shift just one of sequence left, then you will get the same result, only NULL values are skept.
postgres=# select (cs_parse('{0,1,2,3,4,5,6,7,8,9,10}', 'int4') << 3) - cs_parse('{0,1,2,3,4,5,6,7,8,9,10}', 'int4'); ?column? int4:{3,3,3,3,3,3,3,3} (1 row)
Yes. This is the other idea I had - basically, shift the column that does NOT have the actual lag parameter - i.e., if lag(x)-y in pgsql, in IMCS, you do x-lag(y). Thanks for the suggestion! I will try to work with this.
One last question: Is it normal for an IMCS query (a simple one) to take >2.5 hrs (still running as of this comment) on a db size of ~60mil rows after restarting database/entire instance? (My setup: 860 EVO SSD + 16 GB RAM + i5 3.4 GHz)
I've not messed with the settings file, so imcs.autoload
is still true
. I understand that IMCS needs to load data into memory, but the weird thing is that I can't see any increase in memory in the process manager that indicates that IMCS is loading something in the background. Unfortunately, I did have to turn off my work PC yesterday so had to reboot server (which I understand you do not recommend doing).
Could there be any other reason as to why it's taking so long? The entire db size itself is only ~5-6 GB.
Yes, most likely it is loading data. The size of memory is not increased, because it is part of shared memory preallocated at server start. You can check if there is disk activity using vmstat or iotop utilities.
Yes, most likely it is loading data. The size of memory is not increased, because it is part of shared memory preallocated at server start. You can check if there is disk activity using vmstat or iotop utilities.
Had to restart my server again - twice and it works now... weird... Maybe something to do with Ubuntu WSL (still not perfect). Takes ~5-10 mins to load after executing select Quote_load();
after a db restart.
Thanks for all your answers! :) I'll close this issue now!
Hi,
I have been experimenting with IMCS for about a day now and I'm very impressed with the speed and the flexibility available when writing queries with this extension! Thanks for creating this, @knizhnik !
I've been trying to emulate some queries I've written in pure PGSQL in IMCS but since this is my first time working with columnar db's I'm not entirely sure if I'm I have the right idea about how things work here. So I have questions I wanted to ask:
(For all the below examples, I'm assuming a schema exactly the same as that of
Quote
in the introduction docs here.For example, say, if I wanted to know all the stocks which had 14-day simple moving average (SMA) over its close price between a particular period, in SQL that'd be:
Using IMCS, I know I can get 14-day SMA for all stocks using the following query (taken from examples in performance comparison section):
But I'm not sure how I can get the filtering I've done in the SQL in IMCS - if I have to use
cs_filter_pos
, how would I use it? or does this type of filtering require a different type of table?Also, I'm not sure if this is the right place to ask these questions but I did not find any discussion forums for IMCS (despite how big of an extension it is) so if I need to take this to an appropriate place, please do let me know where.