quantopian / zipline

Zipline, a Pythonic Algorithmic Trading Library
https://www.zipline.io
Apache License 2.0
17.68k stars 4.72k forks source link

Can't connect to Yahoo - Errors: Loader: failed to cache the new benchmark returns #1776

Closed kevinyuan closed 7 years ago

kevinyuan commented 7 years ago

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

alembic==0.9.1 appdirs==1.4.3 bcolz==0.12.1 Bottleneck==1.3.0.dev0 click==6.7 contextlib2==0.5.5 cyordereddict==1.0.0 Cython==0.25.2 decorator==4.0.11 empyrical==0.2.2 intervaltree==2.1.0 Logbook==1.0.0 lru-dict==1.1.6 Mako==1.0.6 MarkupSafe==1.0 multipledispatch==0.4.9 networkx==1.11 numexpr==2.6.2 numpy==1.12.1 packaging==16.8 pandas==0.18.1 pandas-datareader==0.3.0.post0 patsy==0.4.1 pyparsing==2.2.0 python-dateutil==2.6.0 python-editor==1.0.3 pytz==2017.2 requests==2.13.0 requests-file==1.4.2 requests-ftp==0.3.1 scipy==0.19.0 setuptools-scm==1.15.5 six==1.10.0 sortedcontainers==1.5.7 SQLAlchemy==1.1.9 statsmodels==0.8.0 tables==3.4.2 toolz==0.8.2 zipline==1.1.0

Now that you know a little about me, let me tell you about the issue I am having:

Description of Issue

I just ran the buyapple.py example that described on http://www.zipline.io/beginner-tutorial.html, but got some errors, can you help to check ?

/home/kevinyuan/dev/zipline> bin/zipline ingest Downloading Bundle: quantopian-quandl [####################################] 100% Writing data to /home/kevinyuan/.zipline/data/quantopian-quandl/2017-05-01T16;36;49.048566.

/home/kevinyuan/dev/zipline> bin/zipline run -f examples/buyapple.py -s 2000-1-1 -e 2001-12-31 [2017-05-01 16:33:28.023652] INFO: Loader: Cache at /home/kevinyuan/.zipline/data/^GSPC_benchmark.csv does not have data from 1990-01-02 00:00:00+00:00 to 2017-04-27 00:00:00+00:00. Downloading benchmark data for '^GSPC'. [2017-05-01 16:33:28.076307] ERROR: Loader: failed to cache the new benchmark returns Traceback (most recent call last): File "/share/dev/tools/lib/python3.5/urllib/request.py", line 1240, in do_open h.request(req.get_method(), req.selector, req.data, headers) File "/share/dev/tools/lib/python3.5/http/client.py", line 1083, in request self.send_request(method, url, body, headers) File "/share/dev/tools/lib/python3.5/http/client.py", line 1128, in send_request self.endheaders(body) File "/share/dev/tools/lib/python3.5/http/client.py", line 1079, in endheaders self._send_output(message_body) File "/share/dev/tools/lib/python3.5/http/client.py", line 911, in _sendoutput self.send(msg) File "/share/dev/tools/lib/python3.5/http/client.py", line 854, in send self.connect() File "/share/dev/tools/lib/python3.5/http/client.py", line 1237, in connect serverhostname=server_hostname) File "/share/dev/tools/lib/python3.5/ssl.py", line 376, in wrap_socket context=self) File "/share/dev/tools/lib/python3.5/ssl.py", line 747, in _init self.do_handshake() File "/share/dev/tools/lib/python3.5/ssl.py", line 983, in do_handshake self.sslobj.do_handshake() File "/share/dev/tools/lib/python3.5/ssl.py", line 628, in do_handshake self._sslobj.dohandshake() ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/zipline/data/loader.py", line 247, in ensure_benchmark_data last_date, File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/zipline/data/benchmarks.py", line 59, in get_benchmark_returns squeeze=True, # squeeze tells pandas to make this a Series File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/pandas/io/parsers.py", line 562, in parser_f return read(filepath_or_buffer, kwds) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/pandas/io/parsers.py", line 301, in read compression=kwds.get('compression', None)) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/pandas/io/common.py", line 308, in get_filepath_or_buffer req = urlopen(str(filepath_or_buffer)) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 162, in urlopen return opener.open(url, data, timeout) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 465, in open response = self.open(req, data) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 483, in _open '_open', req) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 443, in _callchain result = func(*args) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 1283, in https_open context=self.context, check_hostname=self.check_hostname) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 1242, in doopen raise URLError(err) urllib.error.URLError: <urlopen error [SSL: CERTIFICATEVERIFY_FAILED] certificate verify failed (_ssl.c:645)> Traceback (most recent call last): File "/share/dev/tools/lib/python3.5/urllib/request.py", line 1240, in doopen h.request(req.get_method(), req.selector, req.data, headers) File "/share/dev/tools/lib/python3.5/http/client.py", line 1083, in request self.send_request(method, url, body, headers) File "/share/dev/tools/lib/python3.5/http/client.py", line 1128, in send_request self.endheaders(body) File "/share/dev/tools/lib/python3.5/http/client.py", line 1079, in endheaders self._send_output(message_body) File "/share/dev/tools/lib/python3.5/http/client.py", line 911, in _sendoutput self.send(msg) File "/share/dev/tools/lib/python3.5/http/client.py", line 854, in send self.connect() File "/share/dev/tools/lib/python3.5/http/client.py", line 1237, in connect serverhostname=server_hostname) File "/share/dev/tools/lib/python3.5/ssl.py", line 376, in wrap_socket context=self) File "/share/dev/tools/lib/python3.5/ssl.py", line 747, in _init self.do_handshake() File "/share/dev/tools/lib/python3.5/ssl.py", line 983, in do_handshake self.sslobj.do_handshake() File "/share/dev/tools/lib/python3.5/ssl.py", line 628, in do_handshake self._sslobj.dohandshake() ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "bin/zipline", line 11, in sys.exit(main()) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/click/core.py", line 722, in call return self.main(args, *kwargs) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/click/core.py", line 1066, in invoke return process_result(sub_ctx.command.invoke(sub_ctx)) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/click/core.py", line 535, in invoke return callback(args, kwargs) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/zipline/main.py", line 97, in _ return f(args, kwargs) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), args, kwargs) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/zipline/main.py", line 240, in run environ=os.environ, File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/zipline/utils/run_algo.py", line 132, in run env = TradingEnvironment(asset_db_path=connstr) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/zipline/finance/trading.py", line 101, in init self.bm_symbol, File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/zipline/data/loader.py", line 164, in load_market_data trading_day, File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/zipline/data/loader.py", line 247, in ensure_benchmark_data last_date, File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/zipline/data/benchmarks.py", line 59, in get_benchmark_returns squeeze=True, # squeeze tells pandas to make this a Series File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/pandas/io/parsers.py", line 562, in parser_f return read(filepath_or_buffer, kwds) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/pandas/io/parsers.py", line 301, in _read compression=kwds.get('compression', None)) File "/home-tahoe-n2/kevinyuan/dev/zipline/lib/python3.5/site-packages/pandas/io/common.py", line 308, in get_filepath_or_buffer req = _urlopen(str(filepath_or_buffer)) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 162, in urlopen return opener.open(url, data, timeout) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 465, in open response = self.open(req, data) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 483, in open '_open', req) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 443, in _callchain result = func(args) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 1283, in https_open context=self.context, check_hostname=self.check_hostname) File "/share/dev/tools/lib/python3.5/urllib/request.py", line 1242, in doopen raise URLError(err) urllib.error.URLError: <urlopen error [SSL: CERTIFICATEVERIFY_FAILED] certificate verify failed (_ssl.c:645)>

freddiev4 commented 7 years ago

Hi @kevinyuan. I see you've tried installing with pip. Did you check to make sure you have all the necessary dependencies? (Check zipline.io)

Also have you tried using a Python 3.4 conda/virtualenv? As we don't support Python 3.5 yet.

pbharrin commented 7 years ago

I get a similar error when creating an empty TradingAlgorithm(). I traced it down to the Yahoo! URL for the benchmark returns no longer works.
The code generates the following URL: https://ichart.finance.yahoo.com/table.csv?a=11&s=%5EGSPC&b=29&e=15&d=4&g=d&f=2017&c=1989 which does not work.

This is generated from format_yahoo_index_url() in zipline/data/benchmarks.py.

freddiev4 commented 7 years ago

@pbharrin I've just been able to reproduce this issue today as well. Will look into how to fix this/provide an improved solution at some point.

ywang412 commented 7 years ago

Same here today. Not sure how to fix the problem.

freddiev4 commented 7 years ago

All right, now that I'm at my laptop ... it looks like this is an issue coming from Yahoo and their ichart/Finance API.

I believe we have some benchmark data that is cached/downloaded but when those benchmarks are not found then we make additional requests to Yahoo for those benchmarks.

Because we use Yahoo to download our benchmark data, and there is an issue with the API, we cannot make those requests successfully. In regards to providing a more robust solution, at this very moment I don't have any ideas yet. Will just have to wait on the API issue to get resolved for now.

freddiev4 commented 7 years ago

Where we call our get_benchmark_returns function is in zipline.data.loader.ensure_benchmark_data:

    """
    Ensure we have benchmark data for `symbol` from `first_date` to `last_date`

    Parameters
    ----------
    symbol : str
        The symbol for the benchmark to load.
    first_date : pd.Timestamp
        First required date for the cache.
    last_date : pd.Timestamp
        Last required date for the cache.
    now : pd.Timestamp
        The current time.  This is used to prevent repeated attempts to
        re-download data that isn't available due to scheduling quirks or other
        failures.
    trading_day : pd.CustomBusinessDay
        A trading day delta.  Used to find the day before first_date so we can
        get the close of the day prior to first_date.

    We attempt to download data unless we already have data stored at the data
    cache for `symbol` whose first entry is before or on `first_date` and whose
    last entry is on or after `last_date`.

    If we perform a download and the cache criteria are not satisfied, we wait
    at least one hour before attempting a redownload.  This is determined by
    comparing the current time to the result of os.path.getmtime on the cache
    path.
    """
ywang412 commented 7 years ago

Thank you, Freddie. I didn't include more dates in my back testing. Not sure why it requires download again. I wonder if the cache is saved somewhere on my disk so I can read it like a CSV.

ywang412 commented 7 years ago

https://forums.yahoo.net/t5/Yahoo-Finance-help/Is-Yahoo-Finance-API-broken/td-p/250503/page/3 it is officially gone. response from Yahoo finance team. Yahoo discontinued the free service.

ywang412 commented 7 years ago

then how can we use zipline? :(

freddiev4 commented 7 years ago

Hi @ywang412. zipline is one of many friends affected by these changes, so while we're considering solutions, we'll also see what pandas-datareader et al decide to do. We're not sure on the timeframe, but discussion continues on what changes to make. If you have any opinions on the matter feel free to share them 😃

pbharrin commented 7 years ago

Here is a solution: Google finance through the Pandas datareader.

  1. no api key needed
  2. no shady scraping needed

What am I missing?

Calling this with "^GSPC" throws an exception, not sure of the symbol needed for the SPY500, SPY works.

Update 1.0: Due to licensing agreements Google cannot provide data for ^GSPC. SPY is not 100% ^GSPC because SPY pays dividends. Some will actually argue that SPY is a better benchmark as it is something you can actually trade.

tibkiss commented 7 years ago

@pbharrin : Could you please share your patch to replace yahoo loader with google one? Thanks!

quocble commented 7 years ago

Why not use https://www.quandl.com/product/WIKIP/documentation/documentation ?

freddiev4 commented 7 years ago

Hey Peter @pbharrin thanks for your suggestion. I think we've mentioned this in our discussion at some point; there's some other things we'll need to talk about as well (e.g. like SPY vs ^GSPC).

@quocble If I remember correctly, I looked through that WIKI dataset and there are no tickers like SPY, ^GSPC or other ETFs tracking the S&P

pbharrin commented 7 years ago

Here is my patch: https://github.com/zipline-live/zipline/blob/ph_live/zipline/data/benchmarks.py It doesn't require any new libraries or api-keys. This uses the SPY data from Google Finance.

It is using the Close of SPY which does not include dividends. For some reason Google Finance does not return Adj. Close.

This is now failing 4 tests from tests.test_examples.ExamplesTests: test_example_4_olmar test_example_3_dual_ema_talib test_example_1_momentum_pipeline test_example_0_buyapple

This is because the benchmark is slightly different.

ywang412 commented 7 years ago

Thank you, pbharrin! Is https://github.com/zipline-live/ different from https://github.com/quantopian/zipline? One is maintained by the community and the other one by the institution? The push histories are quite different.

pbharrin commented 7 years ago

Yes https://github.com/zipline-live/ is a project we are working on to allow you to live-trade from Zipline. (Like Quantopian, but you can trade any broker you want, any security you want, provided someone writes the code to interact with that broker.) People have expressed interest in using Zipline-live for exchanges in other countries and even Bitcoin.

We forked from Zipline recently so the code base is very similar. If the guys at Quantopian want to use the fix I have proposed then I can make a formal pull request from a Zipline branch. Otherwise you can just overwrite the original benchmarks.py with my benchmarks.py in Zipline and it will work until they figure out a long-term solution.

edboyle01 commented 7 years ago

pbarrin, thx for google SPY patch, I had to add

import pandas_datareader.data as web

got a benchmark warning, but the test Buy_AAPL script seemed to run ok

Downloading benchmark data for '^GSPC'. [2017-05-18 21:44:29.118015] WARNING: Loader: Still don't have expected data after redownload! [2017-05-18 21:44:50.681792] INFO: Performance: Simulated 334 trading days out of 334. [2017-05-18 21:44:50.681929] INFO: Performance: first open: 2016-01-04 14:31:00+00:00 [2017-05-18 21:44:50.682005] INFO: Performance: last close: 2017-05-01 20:00:00+00:00 AAPL algo_volatility algorithm_period_return \

quocble commented 7 years ago

My suspicion is that after Verizon acquired Yahoo (a SV based company), they're not so inclined to give the data for free. Google will likely provide their data for free.

tibkiss commented 7 years ago

@pbharrin : Thanks for the patch, Peter!

clizarralde commented 7 years ago

The patch works! Thanks!

garretthoffman commented 7 years ago

Just wanted to add a note that Google Finance seems to be missing some prices dating back to 2009 (8/11/2009 and 2/2/2012) so with this patch longer term backtests may still crash.

kennell commented 7 years ago

Until the Google-download fix is merged, can we have some option to skip benchmarking so that at least the algorithm is executed?

rosstripi commented 7 years ago

I would like to second @kennell's request for an option to skip benchmarking. Zipline is useless to me right now.

freddiev4 commented 7 years ago

Hi everyone. There currently is no option to skip benchmarking without making code changes to zipline. I think that trying to introduce the option to disable benchmarking would take just as much time as trying to swap out yahoo-related code with google; considering we've already made progress on the swap I'd like to suggest using @pbharrin's patch (linked above) until we've merged this fix, as it seems like that's working for people currently.

Once we've merged the yahoo-google swap, we will start working on a way to turn off benchmarking. We're rethinking how we deal with benchmarking in general, so as to avoid an issue like this in the future, such as if google were to discontinue their historical data API.

yiorgosn commented 7 years ago

Instead of spending the time to rewrite the rest of the code to remove or make the benchmark optional, I would suggest at the "benchmarks.py" piece to have a choice to either a. import a benchmark csv manually (very easy to obtain) OR b. automatic update (via quandl or some other provider) would be a better and faster solution.

freddiev4 commented 7 years ago

I've just merged #1812 which swaps out Yahoo with Google

freddiev4 commented 7 years ago

Is anyone still having issues running backtests when running the latest zipline master (that are related to benchmarks/Yahoo)?

quocble commented 7 years ago

I was getting this on latest master

11:40:28 worker.1   |     (key, self.obj._get_axis_name(axis)))
11:40:28 worker.1   | [2017-06-07 11:40:28,115: WARNING/PoolWorker-1] KeyError: 'the label [2017-06-06 00:00:00+00:00] is not in the [index]'
11:40:28 worker.1   | [2017-06-07 11:40:28,115: WARNING/PoolWorker-1] During handling of the above exception, another exception occurred:
11:40:28 worker.1   | [2017-06-07 11:40:28,116: WARNING/PoolWorker-1] Traceback (most recent call last):
11:40:28 worker.1   | [2017-06-07 11:40:28,116: WARNING/PoolWorker-1] File "/Users/quoc/Projects/alphaoracle/alpha_tasks/tasks/gen_factors.py", line 121, in run
11:40:28 worker.1   |     run_algorithm(start=pd.Timestamp(start, tz="UTC"), end=pd.Timestamp(end, tz="UTC"), capital_base=10000, initialize=make_initialize(f['name'], f['hash'], func, args), before_trading_start=before_trading_start, bundle=zipline_db)
11:40:28 worker.1   | [2017-06-07 11:40:28,116: WARNING/PoolWorker-1] File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/utils/run_algo.py", line 360, in run_algorithm
11:40:28 worker.1   |     environ=environ,
11:40:28 worker.1   | [2017-06-07 11:40:28,116: WARNING/PoolWorker-1] File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/utils/run_algo.py", line 179, in _run
11:40:28 worker.1   |     overwrite_sim_params=False,
11:40:28 worker.1   | [2017-06-07 11:40:28,116: WARNING/PoolWorker-1] File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/algorithm.py", line 700, in run
11:40:28 worker.1   |     for perf in self.get_generator():
11:40:28 worker.1   | [2017-06-07 11:40:28,116: WARNING/PoolWorker-1] File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/gens/tradesimulation.py", line 230, in transform
11:40:28 worker.1   |     handle_benchmark(normalize_date(dt))
11:40:28 worker.1   | [2017-06-07 11:40:28,117: WARNING/PoolWorker-1] File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/gens/tradesimulation.py", line 190, in handle_benchmark
11:40:28 worker.1   |     benchmark_source.get_value(date)
11:40:28 worker.1   | [2017-06-07 11:40:28,117: WARNING/PoolWorker-1] File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/sources/benchmark_source.py", line 73, in get_value
11:40:28 worker.1   |     return self._precalculated_series.loc[dt]
11:40:28 worker.1   | [2017-06-07 11:40:28,117: WARNING/PoolWorker-1] File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/pandas/core/indexing.py", line 1311, in __getitem__
11:40:28 worker.1   |     return self._getitem_axis(key, axis=0)
11:40:28 worker.1   | [2017-06-07 11:40:28,117: WARNING/PoolWorker-1] File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/pandas/core/indexing.py", line 1481, in _getitem_axis
11:40:28 worker.1   |     self._has_valid_type(key, axis)
11:40:28 worker.1   | [2017-06-07 11:40:28,117: WARNING/PoolWorker-1] File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/pandas/core/indexing.py", line 1418, in _has_valid_type
11:40:28 worker.1   |     error()
11:40:28 worker.1   | [2017-06-07 11:40:28,117: WARNING/PoolWorker-1] File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/pandas/core/indexing.py", line 1405, in error
freddiev4 commented 7 years ago

@quocble Could you tell me what command/code you ran, that resulted in that error?

quocble commented 7 years ago

My code basically runs one pipeline eg RSI, and I use it to collect data on the factor. The error I encountered was in SESSION_END, so I suspect, it's not related to my code itself but something related to the last tick.

trade_simulation.py (228)

                elif action == SESSION_END:
                    # End of the session.
                    if emission_rate == 'daily':
                       handle_benchmark(normalize_date(dt))
                    execute_order_cancellation_policy()
                    yield self._get_daily_message(dt, algo, algo.perf_tracker)

zipline_db = os.environ.get("ZIPLINE_DB", 'quantopian-quandl')

run_algorithm(start=pd.Timestamp(start, tz="UTC"), end=pd.Timestamp(end, tz="UTC"), capital_base=10000, initialize=make_initialize(f['name'], f['hash'], func, args), before_trading_start=before_trading_start, bundle=zipline_db)

def initialize(context):
    set_benchmark(None)
    context.algo_name = algo_name
    context.hash = hash
    attach_pipeline(make_pipeline(hash, f.factor(*args)), 'my_pipeline')

def make_pipeline(hash, func):
    # dollar_volume = AverageDollarVolume(window_length=10).top(1000)
    logger.info("make pipeline " + hash)
    return Pipeline(
        columns={
            hash: func,
        },
        # screen=dollar_volume
    )

def before_trading_start(context, data):
    try:
        context.pipeline_data = pipeline_output('my_pipeline')
    except:
        print ("Err occur in pipeline on %s" % context.get_datetime())
        return
freddiev4 commented 7 years ago

@quocble when you run your code, what is your end_date?

mellertson commented 7 years ago

The patch works, thanks!!!

reMAJ commented 7 years ago

@pbharrin @FreddieV4 @mellertson The following questions may sound stupid, but please bear with me.

(1a) If I change the benchmarks.py file with your google function, do I have to recompile zipline or something? Because I see there is a benchmarks.pyc file too and I am thinking zipline use the pyc?

(1b) If I have to recompile (or do anything else) could you please explain or give a link to that?

(2) Could I past the get_benchmark_returns function in my python script of the algo (instead of changing the benchmarks.py) and get it to work?

thanks.

mellertson commented 7 years ago

If you just delete the benchmarks.pyc file, Python will automatically re-compile the *.pyc file.

And if you paste the get_benchmark_return function into your algo, it won't work correctly. You need to copy the file into the zipline/data folder of the zipline package. The zipline package is installed into one of multiple directories, based on your operating system and the way you installed zipline.

For myself, I'm running Linux and used the following command to install zipline: sudo pip2 install zipline, so it was installed into /usr/local/lib/python2.7/dist-packages/zipline. I copied the benchmarks.py file into the directory /usr/local/lib/python2.7/dist-packages/zipline/data. You'll need to figure out which directory to copy the benchmarks.py file into. I've included two commands you can run, depending on your operating system, to find where the benchmarks.py file is on your system.

Windows:

cd c:\
dir /s benchmark.py

Linux or Mac OS X:

cd /
sudo find / -name benchmarks.py
ywang412 commented 7 years ago

The latest merged benchmarks.py doesn't have these two lines in it. if symbol == "^GSPC": symbol = "spy"

The two-line snippet was in pbharrin's patch.

Without the snippet, the code will give "unable to read url error".

_utils.RemoteDataError: Unable to read URL: http://www.google.com/finance/historical?q=%5EGSPC&startdate=Dec+29%2C+1989&enddate=Jun+07%2C+2017&output=csv

quocble commented 7 years ago

@FreddieV4

Here's what you asked for. This time I re-ran the buy_apple example.. I ran this on 6/11.

(lexikon) quoc@MacBook-Pro-5:~/Projects/alphaoracle/alpha_tasks$ zipline run -f ../zipline/zipline/examples/buyapple.py -s 2017-01-01 -e 2017-06-09
[2017-06-12 00:19:44.974160] INFO: Loader: Cache at /Users/quoc/.zipline/data/SPY_benchmark.csv does not have data from 1990-01-02 00:00:00+00:00 to 2017-06-08 00:00:00+00:00.

[2017-06-12 00:19:44.974471] INFO: Loader: Downloading benchmark data for 'SPY' from 1989-12-29 00:00:00+00:00 to 2017-06-08 00:00:00+00:00
[2017-06-12 00:19:46.478470] WARNING: Loader: Still don't have expected data after redownload!
[2017-06-12 00:19:46.478771] INFO: Loader: Cache at /Users/quoc/.zipline/data/treasury_curves.csv does not have data from 1990-01-02 00:00:00+00:00 to 2017-06-08 00:00:00+00:00.

[2017-06-12 00:19:46.478927] INFO: Loader: Downloading treasury data for 'SPY'.
Traceback (most recent call last):
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/pandas/core/indexing.py", line 1395, in _has_valid_type
    error()
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/pandas/core/indexing.py", line 1390, in error
    (key, self.obj._get_axis_name(axis)))
KeyError: 'the label [2017-06-09 00:00:00+00:00] is not in the [index]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/quoc/anaconda/envs/lexikon/bin/zipline", line 11, in <module>
    load_entry_point('zipline==1.1.0+201.g0840ea1', 'console_scripts', 'zipline')()
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/__main__.py", line 97, in _
    return f(*args, **kwargs)
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/__main__.py", line 240, in run
    environ=os.environ,
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/utils/run_algo.py", line 179, in _run
    overwrite_sim_params=False,
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/algorithm.py", line 709, in run
    for perf in self.get_generator():
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/gens/tradesimulation.py", line 230, in transform
    handle_benchmark(normalize_date(dt))
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/gens/tradesimulation.py", line 190, in handle_benchmark
    benchmark_source.get_value(date)
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/zipline/sources/benchmark_source.py", line 75, in get_value
    return self._precalculated_series.loc[dt]
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/pandas/core/indexing.py", line 1296, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/pandas/core/indexing.py", line 1466, in _getitem_axis
    self._has_valid_type(key, axis)
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/pandas/core/indexing.py", line 1403, in _has_valid_type
    error()
  File "/Users/quoc/anaconda/envs/lexikon/lib/python3.6/site-packages/pandas/core/indexing.py", line 1390, in error
    (key, self.obj._get_axis_name(axis)))
KeyError: 'the label [2017-06-09 00:00:00+00:00] is not in the [index]'
freddiev4 commented 7 years ago

@ywang412 the default symbol that is passed in is SPY in the latest merge. If you've pip installed the latest master branch of zipline, or conda installed a recent build of zipline, then the requests should be going out to Google Finance now.

In the URL you posted, GSPC is the symbol being searched for. Did you clone zipline and then pip install that local directory?

freddiev4 commented 7 years ago

@quocble I believe that error is being raised because of this line in particular. Adding a better error message before it's caught in pandas might be a good solution for that.

yiorgosn commented 7 years ago

Benchmark missing yesterday's or T-1 data, a quick band-aid solution. Open file ....\zipline\gens\tradesimulation.py replace in handle_benchmark() line 189 with this:

[new line] algo.perf_tracker.all_benchmark_returns[date] = 1

[replacing line:] algo.perf_tracker.all_benchmark_returns[date] = benchmark_source.get_value(date)

I found this solution here: https://stackoverflow.com/questions/41079890/zipline-bundle-yesterdays-data

pbharrin commented 7 years ago

@FreddieV4 I see that Pyfolio is still calling the Yahoo data from pandas_datareader, any idea when this will be updated?

freddiev4 commented 7 years ago

Hey Peter @pbharrin, it looks like https://github.com/quantopian/pyfolio/pull/386 is currently open so if the author continues working on that then hopefully that'll get merged. If the author doesn't have the time to continue work on it, I can open a PR/build on theirs.

lacabra commented 7 years ago

Hi there, I just installed zipline using the latest github repo, and while trying to run the buyapple.py example, I am running into a similar issue as @quocble but complaining at the beginning of the dataset instead of at the end.

Does anyone have any suggestions for me?

$ zipline run -f algo.py --start 2000-1-1 --end 2014-1-1 [2017-06-19 18:11:48.740235] WARNING: Loader: Refusing to download new benchmark data because a download succeeded at 2017-06-19 17:13:54+00:00. Traceback (most recent call last): File "/env-p2/bin/zipline", line 9, in load_entry_point('zipline==1.1.0+203.g7dea3889', 'console_scripts', 'zipline')() File "/env-p2/lib/python2.7/site-packages/click/core.py", line 722, in call return self.main(args, kwargs) File "/env-p2/lib/python2.7/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/env-p2/lib/python2.7/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/env-p2/lib/python2.7/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/env-p2/lib/python2.7/site-packages/click/core.py", line 535, in invoke return callback(args, kwargs) File "/env-p2/lib/python2.7/site-packages/zipline/main.py", line 97, in _ return f(*args, *kwargs) File "/env-p2/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), args, kwargs) File "/env-p2/lib/python2.7/site-packages/zipline/main.py", line 240, in run environ=os.environ, File "/env-p2/lib/python2.7/site-packages/zipline/utils/run_algo.py", line 179, in _run overwrite_sim_params=False, File "/env-p2/lib/python2.7/site-packages/zipline/algorithm.py", line 709, in run for perf in self.get_generator(): File "/env-p2/lib/python2.7/site-packages/zipline/gens/tradesimulation.py", line 230, in transform handle_benchmark(normalize_date(dt)) File "/env-p2/lib/python2.7/site-packages/zipline/gens/tradesimulation.py", line 190, in handle_benchmark benchmark_source.get_value(date) File "/env-p2/lib/python2.7/site-packages/zipline/sources/benchmark_source.py", line 75, in get_value return self._precalculated_series.loc[dt] File "/env-p2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1296, in getitem return self._getitem_axis(key, axis=0) File "/env-p2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1466, in _getitem_axis self._has_valid_type(key, axis) File "/env-p2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1403, in _has_valid_type error() File "/env-p2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1390, in error (key, self.obj._get_axis_name(axis))) KeyError: 'the label [2000-01-03 00:00:00+00:00] is not in the [index]'

Environment: Mac OS X 10.12.5, Python 2.7.10

freddiev4 commented 7 years ago

Hi @lacabra the reason for that is because we only have benchmark data from Google going back to 2001, so running a backtest before that causes that error.

EDIT: I misspoke on that. I'll need to re-investigate as I remember seeing that error before as well. There's a different answer other than the one I just stated above. What happens if you change your start date to 2002-02-01?

lacabra commented 7 years ago

@FreddieV4 you're absolutely right, setting a later start date did indeed fix the error, thank you very much!

lionelyoung commented 7 years ago

How did you guys get the Google data? Edit extensions.py?

mellertson commented 7 years ago

You can use pandas_datareader.data, here's the docs. They have a pretty good example.

http://pandas-datareader.readthedocs.io/en/latest/remote_data.html

Best regards,

Mike Ellertson

On Wed, Jun 21, 2017 at 7:37 AM, Lionel Young notifications@github.com wrote:

How did you guys get the Google data? Edit extensions.py?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/quantopian/zipline/issues/1776#issuecomment-310080747, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHu1f8MiQxFvX8G-BS28zVXBNi7qTR9ks5sGRy2gaJpZM4NNMzb .

--

Best regards,

Mike Ellertson Email: mdellertson@gmail.com

edmunch commented 7 years ago

Solution for me using YAHOO... quick and dirty

install pandas_datareader install fix_yahoo_finance from here: https://pypi.python.org/pypi/fix-yahoo-finance

patch Benchmarks.py with:

import pandas as pd

from six.moves.urllib_parse import urlencode

import pandas_datareader as pdr #NEW
import fix_yahoo_finance as yf #NEW
yf.pdr_override()#NEW

def get_benchmark_returns(symbol, start_date, end_date):
    print('NEW')
    df = pdr.data.get_data_yahoo(symbol, start=start_date, end=end_date)
    df.to_csv('{}_D1.csv'.format(symbol))
    return pd.read_csv('{}_D1.csv'.format(symbol),
        parse_dates=['Date'],
        index_col='Date',
        usecols=["Adj Close", "Date"],
        squeeze=True,  # squeeze tells pandas to make this a Series
                       # instead of a 1-column DataFrame
    ).sort_index().tz_localize('UTC').pct_change(1).iloc[1:]    

my Setup: Winx64 / Anaconda / Python 3.5 alabaster (0.7.10) alembic (0.9.2) astroid (1.4.9) audioread (2.1.5) Babel (2.4.0) bcolz (0.12.1) beautifulsoup4 (4.6.0) bleach (1.5.0) boto (2.47.0) Bottleneck (1.2.1) bs4 (0.0.1) bz2file (0.98) certifi (2017.4.17) chardet (3.0.4) click (6.7) colorama (0.3.9) contextlib2 (0.5.5) cycler (0.10.0) cyordereddict (1.0.0) Cython (0.25.2) decorator (4.0.11) docutils (0.13.1) empyrical (0.2.2) entrypoints (0.2.2) fix-yahoo-finance (0.0.18) gensim (2.1.0) h5py (2.7.0) html5lib (0.999) idna (2.5) imagesize (0.7.1) intervaltree (2.1.0) ipykernel (4.6.1) ipython (6.1.0) ipython-genutils (0.2.0) ipywidgets (6.0.0) isort (4.2.14) jedi (0.10.2) Jinja2 (2.9.6) joblib (0.11) jsonschema (2.6.0) jupyter-client (5.0.1) jupyter-core (4.3.0) Keras (2.0.5) lazy-object-proxy (1.2.2) librosa (0.5.1) Logbook (1.0.0) lru-dict (1.1.6) Mako (1.0.6) MarkupSafe (1.0) matplotlib (2.0.2) mistune (0.7.4) multipledispatch (0.4.9) multitasking (0.0.4) nbconvert (5.2.1) nbformat (4.3.0) networkx (1.11) nltk (3.2.4) notebook (5.0.0) numexpr (2.6.2) numpy (1.13.0+mkl) numpydoc (0.6.0) olefile (0.44) pandas (0.18.1) pandas-datareader (0.4.0) pandocfilters (1.4.1) path.py (10.3.1) patsy (0.4.1) pep8 (1.7.0) pickleshare (0.7.4) Pillow (4.1.1) pip (9.0.1) prompt-toolkit (1.0.14) protobuf (3.3.0) psutil (5.2.2) pyflakes (1.5.0) Pygments (2.2.0) pylint (1.6.4) pyparsing (2.1.4) PyQt4 (4.11.4) python-dateutil (2.6.0) python-editor (1.0.3) pytz (2017.2) PyYAML (3.12) pyzmq (16.0.2) QtAwesome (0.4.4) qtconsole (4.3.0) QtPy (1.2.1) requests (2.18.1) requests-file (1.4.2) requests-ftp (0.3.1) resampy (0.1.5) rope-py3k (0.9.4.post1) scikit-learn (0.18.1) scipy (0.19.1) seaborn (0.7.1) setuptools (36.0.1) simplegeneric (0.8.1) six (1.10.0) smart-open (1.5.3) snowballstemmer (1.2.1) sortedcontainers (1.5.7) sparsesvd (0.2.2) sphinx (1.6.2) sphinxcontrib-websupport (1.0.1) spyder (3.1.4) SQLAlchemy (1.1.11) statsmodels (0.8.0) tables (3.4.2) tensorflow-gpu (1.1.0) testpath (0.3) tflearn (0.3.1) Theano (0.9.0) toolz (0.8.2) tornado (4.5.1) traitlets (4.3.2) urllib3 (1.21.1) wcwidth (0.1.7) Werkzeug (0.12.2) wheel (0.29.0) widgetsnbextension (2.0.0) win-unicode-console (0.5) wrapt (1.10.10) zipline (1.1.0)

ghost commented 7 years ago

@edmunch great! Now it works!

Before I tried your approach, I had the following error, just in case anybody has the same issue:

Downloading benchmark data for '^GSPC'.
ERROR: Loader: failed to cache the new benchmark returns
Traceback....
.....
python36\lib\urllib\request.py, line 1320, in do_open raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>
freddiev4 commented 7 years ago

Hi there. This should be fixed in the latest release :). Going to close this. Feel free to reopen if you see this issue again