kerwinxu commented 7 years ago

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

Operating System: (win 10`)
Python Version: python35
Python Bitness: $ python -c 'import math, sys;print(int(math.log(sys.maxsize + 1, 2) + 1))'
How did you install Zipline: (conda)
Python packages: $ pip freeze or $ conda list

Now that you know a little about me, let me tell you about the issue I am having:

Description of Issue

What did you expect to happen?
What happened instead?

Here is how you can reproduce this issue on your machine:

Reproduction Steps

1.i install "conda install -n python35 -c Quantopian zipline" 2.zipline ingest 3.zipline run -f dual_moving_average.py --start 2011-1-1 --end 2012-1-1 -o dma.pickle 4.error: [2017-09-20 02:40:15.276265] WARNING: Loader: Refusing to download new benchmark data because a download succeeded at 2017-09-20 02:19:52.057758+00:00. Traceback (most recent call last): File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1395, in _has_valid_type error() File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1390, in error (key, self.obj._get_axis_name(axis))) KeyError: 'the label [2000-01-03 00:00:00+00:00] is not in the [index]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "d:\Anaconda3\envs\python35\Scripts\zipline-script.py", line 11, in load_entry_point('zipline==1.1.1', 'console_scripts', 'zipline')() File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 722, in call return self.main(args, kwargs) File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 697, in main rv = self.invoke(ctx) File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "d:\Anaconda3\envs\python35\lib\site-packages\click\core.py", line 535, in invoke return callback(args, kwargs) File "d:\Anaconda3\envs\python35\lib\site-packages\zipline__main_.py", line 97, in return f(*args, *kwargs) File "d:\Anaconda3\envs\python35\lib\site-packages\click\decorators.py", line 17, in new_func return f(get_current_context(), args, kwargs) File "d:\Anaconda3\envs\python35\lib\site-packages\zipline__main.py", line 240, in run environ=os.environ, File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\utils\run_algo.py", line 179, in _run overwrite_sim_params=False, File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\algorithm.py", line 709, in run for perf in self.get_generator(): File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\gens\tradesimulation.py", line 230, in transform handle_benchmark(normalize_date(dt)) File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\gens\tradesimulation.py", line 190, in handle_benchmark benchmark_source.get_value(date) File "d:\Anaconda3\envs\python35\lib\site-packages\zipline\sources\benchmark_source.py", line 75, in get_value return self._precalculated_series.loc[dt] File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1296, in getitem__ return self._getitem_axis(key, axis=0) File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1466, in _getitem_axis self._has_valid_type(key, axis) File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1403, in _has_valid_type error() File "d:\Anaconda3\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1390, in error (key, self.obj._get_axis_name(axis))) KeyError: 'the label [2000-01-03 00:00:00+00:00] is not in the [index]' ...

What steps have you taken to resolve this already?

...

Anything else?

...

Sincerely, $ whoami

freddiev4 commented 7 years ago

Hi @kerwinxu I'm currently looking at #1953 #1950 #1949 #1947 and it looks like these are all the same problem; they're coming from Google no longer giving us enough benchmark data. Before, we could get up to 4000 days of data for SPY, but it seems we can only get about 251 days now; most likely due to some changes in the Google Finance API

freddiev4 commented 7 years ago

I think until it is fixed, an idea might be to copy https://github.com/quantopian/zipline/blob/master/zipline/resources/market_data/SPY_benchmark.csv to your ~/.zipline/data/ directory and then try running again

QuantGuy01 commented 7 years ago

Thanks about the pointers to benchmarks. I found the code doing this and it looks like Google (not Yahoo) is returning just the last year's worth of data, no matter what dates you pass it. I see other people have since commented on the same.

The latest pandas_reader version also has this same behavior. I modified the benchmarks.py code to use Yahoo and print the data to STDOUT and I then fetched the data as a one-off. I then saved the data into SPY_benchmarks.csv.

I tried just leaving Yahoo in there permanently, but it comes back with errors and I think it has something to do with it rate limiting connections. So doing a one-off grab and saving it into the csv and then changing it back to google worked for me.

Thanks for the help everyone.

ezfine commented 7 years ago

As I mentioned in #1950 the copy from a prepared SPY_benchmark.csv without up-to-date does not work because zipline will compare the latest date and download from Google.

I think currently the better work-around is using the yahoo data with a yahoo-fix-patch for pandas Datareader, here is the reference and see the comment by @edmunch. It does work for me.

kerwinxu commented 7 years ago

this patch seems ok .

edmunch commented on 30 Jun • edited Solution for me using YAHOO... quick and dirty

install pandas_datareader install fix_yahoo_finance from here: https://pypi.python.org/pypi/fix-yahoo-finance

patch Benchmarks.py with:

import pandas as pd

from six.moves.urllib_parse import urlencode

import pandas_datareader as pdr #NEW import fix_yahoo_finance as yf #NEW yf.pdr_override()#NEW

def get_benchmark_returns(symbol, start_date, end_date): print('NEW') df = pdr.data.get_data_yahoo(symbol, start=start_date, end=end_date) df.to_csv('{}_D1.csv'.format(symbol)) return pd.read_csv('{}_D1.csv'.format(symbol), parse_dates=['Date'], index_col='Date', usecols=["Adj Close", "Date"], squeeze=True, # squeeze tells pandas to make this a Series

instead of a 1-column DataFrame

).sort_index().tz_localize('UTC').pct_change(1).iloc[1:]

zxweed commented 7 years ago

no, even if I put the correct SPY_benchmark.csv, call to TradingAlgorithm overwrite it with the wrong version! Please, reopen the issue...

ezfine commented 7 years ago

@zxweed Please see my comment above. It doesn't work by just correcting the spy_benchmark.csv file. You should patch the yahoo download module of pandas DataReader.

zxweed commented 7 years ago

@ezfine I have not used the yahoo download because it's closed by yahoo couple of months ago. I have used the quandl as a source.

ezfine commented 7 years ago

Yes, yahoo made changes of its api several months ago and that's why we need a patch for pandas DataReader. I didn't try quandl data on zipline because it doesn't provide adjust close data.

JoaoAparicio commented 7 years ago

@ezfine @zxweed The original issue at the top of this thread (to be clear, the one with the warning message "WARNING: Loader: Refusing to download new benchmark data because a download succeeded at 2017-09-20 02:19:52.057758+00:00.") has nothing to do with recent changes to google API. data/loader.py has hard-coded a cooldown between downloads of one hour.

@kerwinxu Please read previous paragraph. Perhaps an optional flag to force downloads despite cooldown would be a good idea? Would you like me to PR this?

freddiev4 commented 7 years ago

The reason for this is because Google has now limited users to about 251 days worth of data per request, so you can't run backtests over a year. There is a fix currently being worked on.

There are duplicates of this issue so I'm just going to direct everyone to this issue: https://github.com/quantopian/zipline/issues/1965. I'll comment there when there is a fix on master

quantopian / zipline

KeyError: 'the label [2000-01-03 00:00:00+00:00] is not in the [index]' #1957

Environment

Description of Issue

Reproduction Steps

What steps have you taken to resolve this already?

Anything else?

instead of a 1-column DataFrame