Closed pavitrakumar78 closed 3 years ago
As far I understand in this case you have no just alignment problem: what you actually need is something like ungroup operator. You group timeseries by week and then want to compare it with each day. IMCS has cs_repeat operator which can be used to repeat elements of input sequence requested amount of times. If you group data by week (7 days), then repeating grouped sequence 7 times should restore number of elements in original sequence.
There are two issues which can complicate usage of cs_repeat functions:
May be it is better to implement special "window" grouping function which will preserve number of elements of original sequence? cs_window_group_min({1,2,3,4,5},{1,1,1,2,2}) => {1,1,1,4,4}
There are two issues which can complicate usage of cs_repeat functions:
- origin offset (first element may not correspond to first day of the week)
- there are may be some holes in input sequence (holidays,...)
About these 2 issues, I was thinking I could solve the issue (1.) by making sure that the from_date
part of *_get(....)
function is the start of the timeframe - i.e if it's a week, I can push the date further back until I see a Monday, etc.,
But it seems like I've run into another problem when conducting some experiments with nesting cs_group
functions. This might be related to the issue (2.) you mentioned - but I'm not sure yet.
Issue: average applied on cs_group's output differs from the values I see on my trading platforms and from simple pandas grouping.
Example:
select cs_window_avg((cs_group_last((close_price), (cs_week((symbol_date))))), 14)<<(13) from us_eod_get('SPY.US', '2020-01-03', '2020-11-27');
The last few values of the above executions is:
{...,336.28, 336.77, 338.55, 340.33, 341.75, 343.57}
These are not the correct values.
But if I use the same data with something simple like pandas-ta, where the code is like:
import pandas as pd
import pandas_ta as ta
bars = pd.read_csv(...)
symbol_from_date = '2020-01-03'
symbol_to_date = '2020-11-27'
bars = bars.loc[(bars.date>=symbol_from_date) & (bars.date<=symbol_to_date), :].reset_index(drop=True)
bars['date'] = pd.to_datetime(bars['date'])
logic = {'open' : 'first',
'high' : 'max',
'low' : 'min',
'close' : 'last',
'volume': 'sum'}
barsw = bars.resample('W').apply(logic)
sma = ta.sma(barsw['close'], 14)
The last few values of the above executions (sma
) is:
[..., 333.20, 335.32, 337.04, 337.45, 339.13, 340.81, 342.14, 343.86]
These values correspond with the values I see on my trading platforms so they are correct.
But what I'm confused about is, the output of the price columns form cs_group_last(close_price, cs_week(symbol_date))
and bars.resample('W').apply(logic)
are exactly the same:
{..., 333.83, 346.85, 347.29, 345.77, 326.54, 350.16, 358.10, 355.33, 363.67}
So shouldn't the moving average result of this grouping also be the same?
For any data that is not grouped in any way, the SMA values match so I don't think there are any issues with the SMA module itself, but is this due to lack of handling the missing dates/holidays issue that you mentioned in your post? SPY.US.csv.zip
Sorry, I an not reproduce the problem with the attached CSV file:
select cs_window_avg(cs_group_last(Close, cs_week(Day)),14) from Quote_get('2020-01-03', '2020-11-27');
float8:{23.02928597586495,46.2942853655134,70.00500052315849,93.48857116699219,116.4692862374442,140.1978585379464,164.312144688
1975,188.1321454729353,209.2935747419085,230.5407169886998,249.7778603690011,266.120717729841,284.2221461704799,301.9500034877232
,298.7921469552176,296.0014343261719,292.5028621128627,289.2185777936663,287.1264343261719,283.846433367048,280.8350045340402,278
.7521471296038,280.4007175990513,280.8828604561942,283.6914324079241,288.7807170322963,292.9814322335379,297.9385746547154,301.04
71452985491,303.4928588867188,306.6035723005022,310.3021436418806,313.4735717773438,317.2735726492746,321.2121429443359,323.94428
58014788,324.9957144601004,326.884286063058,328.3192858014788,330.7328578404018,333.2057146344866,335.327144077846,337.0457153320
312,337.4500013078962,339.1385737827846,340.8192879813058,342.1400015694754,343.8678588867188}
Can you also please confirm the outputs for the following?
select cs_group_last('int4:{3329970,3332360,3342320,3314730,3360940,3354870,3354970,3365620,3372900,3358850,3369310,3381259,3415520,3427480,3461840,3469409,3491820,3479169,3511940,3562740,3440130,3412040,3318809,3384350,3325590,3327280,3371100,3388140,3374690,3345009,3306500,3269700,3303000,3226400,3235000,3287300,3341900,3323700,3348900,3370400,3338399,3397600,3349300,3407600,3437799,3468500,3524300,3501300,3479300,3475000,3472900,3420100,3433800,3427300,3446100,3457799,3393900,3382200,3266600,3299800,3265400,3302000,3360299,3435400,3502400,3501600,3545600,3540400,3566700,3532100,3581000,3625700,3606200,3562799,3577799,3553300,3574600,3632200,3626600,3636700}', cs_week('date:{2020-08-06,2020-08-07,2020-08-10,2020-08-11,2020-08-12,2020-08-13,2020-08-14,2020-08-17,2020-08-18,2020-08-19,2020-08-20,2020-08-21,2020-08-24,2020-08-25,2020-08-26,2020-08-27,2020-08-28,2020-08-31,2020-09-01,2020-09-02,2020-09-03,2020-09-04,2020-09-08,2020-09-09,2020-09-10,2020-09-11,2020-09-14,2020-09-15,2020-09-16,2020-09-17,2020-09-18,2020-09-21,2020-09-22,2020-09-23,2020-09-24,2020-09-25,2020-09-28,2020-09-29,2020-09-30,2020-10-01,2020-10-02,2020-10-05,2020-10-06,2020-10-07,2020-10-08,2020-10-09,2020-10-12,2020-10-13,2020-10-14,2020-10-15,2020-10-16,2020-10-19,2020-10-20,2020-10-21,2020-10-22,2020-10-23,2020-10-26,2020-10-27,2020-10-28,2020-10-29,2020-10-30,2020-11-02,2020-11-03,2020-11-04,2020-11-05,2020-11-06,2020-11-09,2020-11-10,2020-11-11,2020-11-12,2020-11-13,2020-11-16,2020-11-17,2020-11-18,2020-11-19,2020-11-20,2020-11-23,2020-11-24,2020-11-25,2020-11-27}'))/cs_const(10000,'float8');
My output:
float8:{333.236,335.497,338.1259,349.182,341.204,332.728,330.65,328.73,333.8399,346.85,347.29,345.7799,326.54,350.16,358.1,355.33,363.67}
select cs_group_last(close_price, cs_week(symbol_date)) from us_eod_get('SPY.US', '2020-01-03', '2020-11-27');
My output:
float8:{317.849,321.1019,327.254,324.119,317.178,327.5,332.824,328.762,292.069,293.252,265.51,226.889,251.303,246.117,275.8759,284.246,280.607,280.428,289.998,283.889,292.972,301.778,316.673,301.669,307.409,298.854,310.985,316.324,320.437,319.6,325.218,333.236,335.497,338.1259,349.182,341.204,332.728,330.65,328.73,333.8399,346.85,347.29,345.7799,326.54,350.16,358.1,355.33,363.67}
Upon closer inspection, the cs_group is not giving me the same result as the panda's Weekly group apply procedure. Only the last 7 or 8 values seem to match but after that, the differences start building up which ends up giving the wrong result for the cs_avg. I'm testing this on a clean imcs+postgres-12 on ubuntu 18.
Output of first query is exactly the same as yours. Output of second query is:
float4:{322.41,325.71,331.95,328.77,321.73,332.2,337.6,333.48,296.26,297.46,269.32,228.8,253.42,248.19,278.2,286.64,282.97,282.7
9,292.44,286.28,295.44,304.32,319.34,304.21,308.64,300.05,312.23,317.59,321.72,320.88,326.52,334.57,336.84,339.48,350.58,342.57,3
34.06,330.65,328.73,333.84,346.85,347.29,345.78,326.54,350.16,358.1,355.33,363.67}
Just figured it out. I thought this could not have been possible but alas - the SPY data in my db is not consistent with the CSV data. Tried a few other symbols and they match in values! Not sure how this happened but something that's completely unrelated to IMCS and that I need to fix - sorry about that!
About the original query of this issue, I'll try some experiments based on the ideas you have mentioned in your comment and update or post a comment here about my findings.
I hope it's OK to keep this issue open.
I have added cs_wingroup* family of functions. I hope now your task can be solved in more easy way.
I have added cs_wingroup* family of functions. I hope now your task can be solved in more easy way.
Wow! Thank you! This is super useful for multi-timeseries comparison!!! This will help definitely with my task!
Hi,
I've been writing a lot of supportive code to deal with the alignment of arrays when performing operations with 2 timeseries. I understand that since IMCS does not have NULL values, this is something we have to handle by ourselves.
In most cases, I can calculate the shift beforehand by estimating the date range and other parameters, but this becomes unpredictable when there is a grouping function involved - functions like
cs_group_max
,cs_group_avg
etc.,Here's an example:
What I'm trying to do is get the difference between the close price and the weekly lowest close price for a date range. This obviously gives incorrect result because the arrays are subtracted like this:
I can fix this by adding a left shift (
<<
) of 12 toclose_price
. But calculating this shift becomes complex because in actual use:With these issues, I'm not sure how it's possible to correctly dynamically shift/align the arrays so that the calculation is correct.
Would it be possible to create or is there a way with current functions, we can iterate (the
imcs_iterator_h
) from the end and stop at the minimum distance of the 2 timeseries? That way, shifting won't be necessary and calculations would be correct or is there any other way to achieve the same effect but in a more efficient way other than to execute the queries individually, calculate the shift by looking at the result and then do the subtraction operation (as in the example above)?Edit: Just wanted to say, IMCS has been a joy to use for my projects! Thank you again for building and supporting it!