Closed michaelwills closed 8 years ago
The generated timestamps are like
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-04-16 17:30:00, 2012-04-16 17:35:00]
Length: 2, Freq: None, Timezone: US/Eastern
I think the issue is that for some of the risk metrics (e.g. alpha, beta) we require a benchmark to be present (e.g. S&P500). This is loaded from the msgpack but only has a limited time range (and you are exceeding it).
I suppose there are two ways to fix this, none of them immediate unfortunately:
P.S. Please keep those problem reports coming, it's very helpful for us!
I am using it in a non-standard way for sure. Forex isn't natively supported of course. The idea is to generate all the test data needed separately, i.e. use MT4 to export OHLC data (or just pop in tick data from my broker), and use exported indicator data to use in handle_data. I'm a bit new to this kind of backtesting so I'd like to understand what the risk metrics supplies. The comments in risk.py are quite helpful in this regard. I'd definitely like to know the sharpe ratio, etc.
So option 2 would be nice for a quick solution though it doesn't sound quick. :) But option 1 is more desirable for the long term.
[edit] Actually when I get some time I'll look to see how it builds the data. Maybe I can hack some data together and drop it in as a replacement for the treasuries msgpack.
Thinking some more about this, an easy interface would be to just specify a column in your pandas dataframe that holds your indicator. People will probably want to use other benchmark data sets. That way one would just retrieve e.g. S&P500 alongside the data and it would also be the same range.
Pseudocode:
data = load_from_yahoo(stocks=['AAPL']) # loads SP500 automatically
dma = DualMovingAverage()
results = dma.run(data, benchmark='SP500') # will expect SP500 column in dataframe
If that isn't supplied we could try to fall back to the msgpack benchmark we provide now.
Sound sane?
That sounds good actually. I haven't inspected the benchmark data yet but it's just close prices? Does it have to be end of day data or would any timeframe matching my data suffice?
I see it's just
In [15]: data[-10:]
Out[15]:
(((2012, 10, 22, 0, 0, 0, 0), 0.00041864067373232743),
((2012, 10, 23, 0, 0, 0, 0), -0.014388940812141747),
((2012, 10, 24, 0, 0, 0, 0), -0.0031488819699972016),
((2012, 10, 25, 0, 0, 0, 0), 0.0022912026331096645),
((2012, 10, 26, 0, 0, 0, 0), -0.0007289609828941681),
((2012, 10, 31, 0, 0, 0, 0), 0.0008292050262582107),
((2012, 11, 1, 0, 0, 0, 0), 0.01089788981730624),
((2012, 11, 2, 0, 0, 0, 0), -0.009379443677806564),
((2012, 11, 5, 0, 0, 0, 0), 0.002291339585012948),
((2012, 11, 6, 0, 0, 0, 0), 0.00785318149104618))
I am assuming those are returns for the period, days in this case. If I am working with 5 minute bars would I need to provide that per bar?
I just realized that's the benchmark data, which I'd still need to provide I imagine. The bit that failed was the treasury data which is also daily data
(((2012, 11, 5, 0, 0, 0, 0),
{'10year': 0.0172,
'1month': 0.0009,
'1year': 0.0019,
'20year': 0.0247,
'2year': 0.0028,
'30year': 0.0288,
'3month': 0.0011,
'3year': 0.0038,
'5year': 0.007,
'6month': 0.0015,
'7year': 0.0113,
'tid': 5719}),
((2012, 11, 6, 0, 0, 0, 0),
{'10year': 0.0178,
'1month': 0.0012,
'1year': 0.0019,
'20year': 0.0252,
'2year': 0.003,
'30year': 0.0292,
'3month': 0.001,
'3year': 0.0041,
'5year': 0.0075,
'6month': 0.0015,
'7year': 0.0119,
'tid': 5720}))
Quantopian supports minute data so I assume zipline does as well. Will these data sets be fine as is with daily data? And since it searched for
2012-04-17 00:00:00-04:00
instead of something like
2012-04-17 00:00:00
could I essentially fill in the data with data from the nearest point to allow it to complete with a full risk report?
And finally, could it work to have treasuries optionally passed in the same way as the benchmark?
I think for now we will just provide functionality to update the benchmark and treasury data. Ultimately it would be nicer if those could be user supplied.
Would that help for now?
That would and it is most appreciated!
Part of the challenge is to see what choose_treasury is looking for.
Gah I think I see it now. My timestamp is US/Eastern (-4:00) so I need to do the .tz_convert('UTC') in order for it to match. The day is there but the timezone is different so it could never find a match so there was no rate found. With this
data.index = tseries.index.DatetimeIndex(data=data.index).tz_localize('US/Eastern').tz_convert('UTC')
it's actually running and printing the data. I can keep digging now. Thank you for your patience!
Some further notes. I have arbitrary data going in and I can run tests but I still get exceptions which are probably expected given that I am using intraday data:
self.period_start = {Timestamp} 2012-11-06 14:10:00+00:00
self.trading_days[-1] = {datetime} 2012-11-06 00:00:00+00:00
(<type 'exceptions.AssertionError'>, AssertionError('Period start falls after the last known trading day.',), None)
"zipline/finance/trading.py", line 86, in __init__
"Period start falls after the last known trading day."
AssertionError: Period start falls after the last known trading day.
That being the case if there is a simple way to allow running without the benchmark and calculated metrics (as in your comment @twiecki 2 days ago at https://github.com/quantopian/zipline/issues/13#issuecomment-10199210). I haven't gone through all the source but is there a relatively pain free way I can disable this? Or perhaps since trading days are calculated based on the benchmark returns I can fill that data out so it is accounted for.
At the moment I just catch the exception and let it go as far as possible so I am able to test strategies.
Thanks again for releasing this!
Yeah, we really need to make this optional. You can look into finance/performance.py where the risk object is updated if you want.
On Sun, Nov 11, 2012 at 12:22 AM, michaelwills notifications@github.comwrote:
Some further notes. I have arbitrary data going in and I can run tests but I still get exceptions which are probably expected given that I am using intraday data:
self.period_start = {Timestamp} 2012-11-06 14:10:00+00:00self.trading_days[-1] = {datetime} 2012-11-06 00:00:00+00:00(<type 'exceptions.AssertionError'>, AssertionError('Period start falls after the last known trading day.',), None) "zipline/finance/trading.py", line 86, in init "Period start falls after the last known trading day."AssertionError: Period start falls after the last known trading day.
That being the case if there is a simple way to allow running without the benchmark and calculated metrics (as in your comment @twieckihttps://github.com/twiecki2 days ago at
13https://github.com/quantopian/zipline/issues/13#issuecomment-10199210).
I haven't gone through all the source but is there a relatively pain free way I can disable this? Or perhaps since trading days are calculated based on the benchmark returns I can fill that data out so it is accounted for.
At the moment I just catch the exception and let it go as far as possible so I am able to test strategies.
Thanks again for releasing this!
— Reply to this email directly or view it on GitHubhttps://github.com/quantopian/zipline/issues/13#issuecomment-10263387.
I'll have to look into that. Thanks!
I have a similar issue, and I'd like to turn benchmarking off. Is there a reference for the procedure?
+1 i just filed a similar bug (https://github.com/quantopian/zipline/issues/125) because I didn't notice this one. ^GSPC goes back until 1950, but yet we're limited to running backtests to 1990 because that's as far back as the treasury data goes.
I'm actively working as a high priority on https://github.com/quantopian/zipline/issues/46 (streaming of benchmarks and treasury data), which is relevant to this issue. Since benchmarks and treasury as an input to certain risk metrics is the main reason that they are currently required.
While refactoring how benchmarks and treasury data are stored in risk, I'll see if I can get in some options/flags to disable them completely.
A question I have is, should the default date range remain as the one that has both benchmark and treasury data contained within? I'm tending towards saying 'yes', so as to provide richer metrics out of the box. With the disabling of the metrics being something that needs to be explicitly done.
Makes sense to me. We could make the default date be 1990 since that's when the benchmarks start, but then add a good error message if you go outside and add an option to disable.
Ben, I agree.
I'm thinking that the steps of setting the default date of 1990 and providing the warning/error would be at home one your defaults branch.
@ehebert i updated the pull request (https://github.com/quantopian/zipline/pull/121) to use 1990 as the default start date. i'll leave the warning/error for another change. seems like it'd go well with the option to disable the benchmarks
I think that the best fix for this is to use the 10-year treasury as a benchmark. It is more commonly used as a benchmark and there is good data for it back until the 1950s or 60s. See https://github.com/quantopian/zipline/issues/132
We ended up using the 10y treasury curve in https://github.com/quantopian/zipline/commit/d177ddd860fb6419767dd14b587e97615de31519
This is using a local copy of zipline instead of the site-packages one.
The data is OHCL and other indicator exported out as CSV from Metatrader 4. The timestamps are then munged to be the index similar to fast-data-mining-with-pytables-and-pandas.pdf and also localized
and when trying out the algo
this results in
At this point the basic test does work using the local copy of zipline which was my sanity check.
[edit: iPython notebook's trackback is clearer]