Closed kerwinxu closed 7 years ago
Hi @kerwinxu I'm currently looking at #1953 #1950 #1949 #1947 and it looks like these are all the same problem; they're coming from Google no longer giving us enough benchmark data. Before, we could get up to 4000 days of data for SPY
, but it seems we can only get about 251 days now; most likely due to some changes in the Google Finance API
I think until it is fixed, an idea might be to copy https://github.com/quantopian/zipline/blob/master/zipline/resources/market_data/SPY_benchmark.csv to your ~/.zipline/data/
directory and then try running again
Thanks about the pointers to benchmarks. I found the code doing this and it looks like Google (not Yahoo) is returning just the last year's worth of data, no matter what dates you pass it. I see other people have since commented on the same.
The latest pandas_reader version also has this same behavior. I modified the benchmarks.py code to use Yahoo and print the data to STDOUT and I then fetched the data as a one-off. I then saved the data into SPY_benchmarks.csv.
I tried just leaving Yahoo in there permanently, but it comes back with errors and I think it has something to do with it rate limiting connections. So doing a one-off grab and saving it into the csv and then changing it back to google worked for me.
Thanks for the help everyone.
As I mentioned in #1950 the copy from a prepared SPY_benchmark.csv without up-to-date does not work because zipline will compare the latest date and download from Google.
I think currently the better work-around is using the yahoo data with a yahoo-fix-patch for pandas Datareader, here is the reference and see the comment by @edmunch. It does work for me.
this patch seems ok .
edmunch commented on 30 Jun • edited Solution for me using YAHOO... quick and dirty
install pandas_datareader install fix_yahoo_finance from here: https://pypi.python.org/pypi/fix-yahoo-finance
patch Benchmarks.py with:
import pandas as pd
from six.moves.urllib_parse import urlencode
import pandas_datareader as pdr #NEW import fix_yahoo_finance as yf #NEW yf.pdr_override()#NEW
def get_benchmark_returns(symbol, start_date, end_date): print('NEW') df = pdr.data.get_data_yahoo(symbol, start=start_date, end=end_date) df.to_csv('{}_D1.csv'.format(symbol)) return pd.read_csv('{}_D1.csv'.format(symbol), parse_dates=['Date'], index_col='Date', usecols=["Adj Close", "Date"], squeeze=True, # squeeze tells pandas to make this a Series
).sort_index().tz_localize('UTC').pct_change(1).iloc[1:]
no, even if I put the correct SPY_benchmark.csv, call to TradingAlgorithm overwrite it with the wrong version! Please, reopen the issue...
@zxweed Please see my comment above. It doesn't work by just correcting the spy_benchmark.csv file. You should patch the yahoo download module of pandas DataReader.
@ezfine I have not used the yahoo download because it's closed by yahoo couple of months ago. I have used the quandl as a source.
Yes, yahoo made changes of its api several months ago and that's why we need a patch for pandas DataReader. I didn't try quandl data on zipline because it doesn't provide adjust close data.
@ezfine @zxweed The original issue at the top of this thread (to be clear, the one with the warning message "WARNING: Loader: Refusing to download new benchmark data because a download succeeded at 2017-09-20 02:19:52.057758+00:00.") has nothing to do with recent changes to google API. data/loader.py
has hard-coded a cooldown between downloads of one hour.
@kerwinxu Please read previous paragraph. Perhaps an optional flag to force downloads despite cooldown would be a good idea? Would you like me to PR this?
The reason for this is because Google has now limited users to about 251 days worth of data per request, so you can't run backtests over a year. There is a fix currently being worked on.
There are duplicates of this issue so I'm just going to direct everyone to this issue: https://github.com/quantopian/zipline/issues/1965. I'll comment there when there is a fix on master
Dear Zipline Maintainers,
Before I tell you about my issue, let me describe my environment:
Environment
$ python -c 'import math, sys;print(int(math.log(sys.maxsize + 1, 2) + 1))'
$ pip freeze
or$ conda list
Now that you know a little about me, let me tell you about the issue I am having:
Description of Issue
Here is how you can reproduce this issue on your machine:
Reproduction Steps
1.i install "conda install -n python35 -c Quantopian zipline" 2.zipline ingest 3.zipline run -f dual_moving_average.py --start 2011-1-1 --end 2012-1-1 -o dma.pickle 4.error: [2017-09-20 02:40:15.276265] WARNING: Loader: Refusing to download new benchmark data because a download succeeded at 2017-09-20 02:19:52.057758+00:00. Traceback (most recent call last): File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1395, in _has_valid_type error() File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1390, in error (key, self.obj._get_axis_name(axis))) KeyError: 'the label [2000-01-03 00:00:00+00:00] is not in the [index]'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "d:\Anaconda3\envs\python35\Scripts\zipline-script.py", line 11, in
load_entry_point('zipline==1.1.1', 'console_scripts', 'zipline')()
File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 722, in call
return self.main(args, kwargs)
File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 697, in main
rv = self.invoke(ctx)
File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 895, in invoke
return ctx.invoke(self.callback, ctx.params)
File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 535, in invoke
return callback(args, kwargs)
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline__main_.py", line 97, in
return f(*args, *kwargs)
File "d:\Anaconda3\envs\python35\lib\site-packages\click\decorators.py", line 17, in new_func
return f(get_current_context(), args, kwargs)
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline__main.py", line 240, in run
environ=os.environ,
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\utils\run_algo.py", line 179, in _run
overwrite_sim_params=False,
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\algorithm.py", line 709, in run
for perf in self.get_generator():
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\gens\tradesimulation.py", line 230, in transform
handle_benchmark(normalize_date(dt))
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\gens\tradesimulation.py", line 190, in handle_benchmark
benchmark_source.get_value(date)
File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\sources\benchmark_source.py", line 75, in get_value
return self._precalculated_series.loc[dt]
File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1296, in getitem__
return self._getitem_axis(key, axis=0)
File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1466, in _getitem_axis
self._has_valid_type(key, axis)
File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1403, in _has_valid_type
error()
File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1390, in error
(key, self.obj._get_axis_name(axis)))
KeyError: 'the label [2000-01-03 00:00:00+00:00] is not in the [index]'
...
What steps have you taken to resolve this already?
...
Anything else?
...
Sincerely,
$ whoami