quantopian / zipline

Zipline, a Pythonic Algorithmic Trading Library
https://www.zipline.io
Apache License 2.0
17.77k stars 4.74k forks source link

Error: No bundle registered with the name <whatever (bundle_name).py> #2688

Open hemzyt opened 4 years ago

hemzyt commented 4 years ago

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

* Operating System: Ubuntu * Python Version: `Python 3.6.9` * Python Bitness: 7fffffffffffffff True (64) * How did you install Zipline: pip install zipline * Python packages: :::spoiler absl-py==0.9.0 alembic==1.4.2 alpaca-trade-api==0.46 alpha-vantage==2.1.3 astor==0.8.1 asyncio-nats-client==0.10.0 attrs==19.3.0 autoenv==1.0.0 backcall==0.1.0 bcolz==0.12.1 bleach==3.1.3 Bottleneck==1.3.2 cachetools==4.1.0 certifi==2019.11.28 chardet==3.0.4 click==7.1.1 contextlib2==0.6.0.post1 cycler==0.10.0 cyordereddict==1.0.0 Cython==0.29.15 decorator==4.4.2 defusedxml==0.6.0 Deprecated==1.2.7 empyrical==0.5.3 entrypoints==0.3 finnhub==0.1.1 gast==0.2.2 gevent==1.4.0 google-auth==1.13.1 google-auth-oauthlib==0.4.1 google-pasta==0.2.0 graphviz==0.13.2 greenlet==0.4.15 grpcio==1.28.1 h5py==2.10.0 idna==2.9 iexfinance==0.4.3 importlib-metadata==1.5.0 inflection==0.3.1 intervaltree==3.0.2 ipykernel==5.1.4 ipython==7.13.0 ipython-genutils==0.2.0 iso8601==0.1.12 jedi==0.16.0 Jinja2==2.11.1 joblib==0.14.1 johansen==0.0.4 json5==0.9.3 jsonschema==3.2.0 jupyter-client==6.1.0 jupyter-core==4.6.3 jupyterlab==2.0.1 jupyterlab-server==1.0.7 Keras==2.3.1 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.0 kiwisolver==1.1.0 Logbook==1.5.3 lru-dict==1.1.6 lxml==4.5.0 Mako==1.1.2 Markdown==3.2.1 MarkupSafe==1.1.1 matplotlib==3.2.1 mistune==0.8.4 mock==4.0.2 more-itertools==8.2.0 mpl-finance==0.10.1 msgpack-python==0.5.6 multipledispatch==0.6.0 multitasking==0.0.9 nbconvert==5.6.1 nbformat==5.0.4 networkx==1.11 nodejs==0.1.1 notebook==6.0.3 numexpr==2.7.1 numpy==1.18.2 oauthlib==3.1.0 opt-einsum==3.2.0 optional-django==0.1.0 packaging==20.3 pandas==0.22.0 pandas-datareader==0.8.1 pandocfilters==1.4.2 parso==0.6.2 patsy==0.5.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==7.1.1 pkg-resources==0.0.0 pluggy==0.13.1 polygon-api-client==0.1.2 prometheus-client==0.7.1 prompt-toolkit==3.0.4 protobuf==3.11.3 ptyprocess==0.6.0 py==1.8.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pydot==1.4.1 pyfolio==0.9.2 Pygments==2.6.1 pymarketstore==0.17 pyparsing==2.4.6 pyrsistent==0.15.7 pytest==5.4.1 python-dateutil==2.8.1 python-editor==1.0.4 pytz==2019.3 PyYAML==5.3.1 pyzmq==19.0.0 Quandl==3.5.0 requests==2.23.0 requests-file==1.4.3 requests-oauthlib==1.3.0 rsa==4.0 scikit-learn==0.22.2.post1 scipy==1.4.1 seaborn==0.10.0 Send2Trash==1.5.0 six==1.14.0 sortedcontainers==2.1.0 SQLAlchemy==1.3.15 statsmodels==0.11.1 tables==3.6.1 tensorboard==1.13.1 tensorflow==1.13.1 tensorflow-estimator==1.13.0 termcolor==1.1.0 terminado==0.8.3 testpath==0.4.4 toolz==0.10.0 tornado==6.0.4 tqdm==4.45.0 trading-calendars==1.11.5 traitlets==4.3.3 ujson==2.0.3 urllib3==1.24.3 wcwidth==0.1.8 webencodings==0.5.1 websocket==0.2.1 websocket-client==0.56.0 websockets==8.0.2 Werkzeug==1.0.1 wrapt==1.12.1 yfinance==0.1.54 zipline==1.3.0 zipp==3.1.0 :::

Now that you know a little about me, let me tell you about the issue I am having:

Description of Issue

Here is how you can reproduce this issue on your machine:

Reproduction Steps

  1. Download yfinance package $ pip install yfinance
  2. Download data, and reformat (directory home/user/downloads/customdata $ cat > importdata.py
    
    from pandas_datareader import data as pdr
    import yfinance as yf
    yf.pdr_override() # <== that's all it takes :-)

download dataframe and covert to proper .csv for zipline(date,open,high,low,close,volume,dividend,split)

stockList = ['SPY', 'TLT', 'IEF', 'GLD', 'DBC'] for stock in stockList: data = pdr.get_data_yahoo(stock, period='max', auto_adjust=True) # get data = data.reset_index() # get date data = data.rename(columns={'Date': 'date', 'Open': 'open', 'High': 'high', 'Low': 'low', 'Close': 'close', 'Volume': 'volume'}) #rearrange and rename data.to_csv('{}.csv'.format(stock), index = False)

` $ python importdata.py`
Output: 

$ ls ~/downloads/customdata DBC.csv GLD.csv IEF.csv SPY.csv TLT.csv import_data.py

3. Create Bundle Creator
Source: **Trading Evolved: Anyone can Build Killer Trading Strategies in Python**
`$ cat > customdata.py`
```python
import pandas as pd
from os import listdir

# Data Path, change
path = '/user/home/downloads/yfinance_csv/'

"""
The ingest function needs to have signature,
arguement needs to be passed.
"""
def customdata(environ,
    asset_db_writer,
    minute_bar_writer,
    daily_bar_writer,
    adjustment_writer,
    calendar,
    start_session,
    end_session,
    cache,
    show_progress,
    output_dir):

# get list of files from path, slicing last 3 of name
    symbols = [f[:-4] for f in listfir(path)]
    if not symbols:
        raise ValueError("No symbols found in folder.")

    # Prep catalys: dividends
    divs = pd.DataFrame(columns=['sid',
                                'amount',
                                'ex_date',
                                'record_date',
                                'declared_date',
                                'pay_date']
    )

    # prep catalyst: splits
    splits = pd.DataFrame(columns=['sids',
                                  'ratio',
                                  'effective_date']
    )

    # Prep catalyst: metadata
    metadata = pd.DataFrame(columns=('start_date',
                                    'end_date',
                                    'auto_close_date',
                                    'symbol',
                                    'exchange'
                                    )
                                )

    # Check valid trading dates, according to selected exchange calendar
    sessions = calendar.sessions_in_range(start_session, end_session)

    # Process: fetch, align daes, process div and meta
    ## Get data for all stock, write to Zipline
    daily_bar_writer.write(
        process_stocks(symbols, sessions, metadata, divs)
        )

    ## Write metadata
    asset_db_writer.write(equities=metadata)
    ## Write splits/dividends
    adjustment_writer.write(splits=splits,
                           dividends=divs)

"""
Generate function iterate stock, build hist data, meta, dividend
"""
def process_stocks(symbols, sessions, metadata, divs):
    for sid, symbol in enumerate(symbols): #loop stocks, set SID
        print('Loading {}...'.formati(symbol)) # read csv data
        df = pd.read_csv('{}/{}.csv'.format(path, symbol), index_col=[0], parse_dates=[0])
        start_date = df.index[0] # check first/last date
        end_date = df.index[-1] 
        df = df.reindex(sessions.tz_localize(None))[start_date:end_date] # syn official exchange calendar
        df.fillna(method='ffill', inplace=True) # fill missing
        df.dropna(inplace=True) # drop NaN remaining
        # The auto_close date is the day after the last trade.
        ac_date = end_date + pd.Timedelta(days=1)
        # Add a row to the metadata DataFrame. Don't forget to add an exchange field.
        metadata.loc[sid] = start_date, end_date, ac_date, symbol, "NYSE"
        # If there's dividend data, add that to the dividend DataFrame
        if 'dividend' in df.columns:
        # Slice off the days with dividends
            tmp = df[df['dividend'] != 0.0]['dividend']
            div = pd.DataFrame(data=tmp.index.tolist(), columns=['ex_date'])
            # Provide empty columns as we don't have this data for now
            div['record_date'] = pd.NaT
            div['declared_date'] = pd.NaT
            div['pay_date'] = pd.NaT
            # Store the dividends and set the Security ID
            div['amount'] = tmp.tolist()
            div['sid'] = sid
            # Start numbering at where we left off last time
            ind = pd.Index(range(divs.shape[0], divs.shape[0] + div.shape[0]))
            div.set_index(ind, inplace=True)
            # Append this stock's dividends to the list of all dividends
            divs = divs.append(div)
        yield sid, df
`(jupyter_env)$ mv customdata.py /virtualenv/jupyter_env/lib/python3.6/site-packages/zipline/data/bundles`
  1. Edit .zipline/extension.py $ nano extension.py
    
    # Edit .zipline/extension.py to register .py bundle.
    from zipline.data.bundles import core, register, customdata

register('customdata.py', customdata . customdata , calendar_name='NYSE')

...

## What steps have you taken to resolve this already?
1. Follow csvdir
Because pricing datasets are different from each provider, I decided to see what happens if use a different extension.py code, directly from the zipline documentation. I also included the path to the files.
```python
import pandas as pd

from zipline.data.bundles import register
from zipline.data.bundles.csvdir import csvdir_equities

start_session = pd.Timestamp('2016-1-1', tz='utc')
end_session = pd.Timestamp('2018-1-1', tz='utc')

register(
    'custom-csvdir-bundle',
    csvdir_equities(
        ['daily'],
        '/home/<user>/downloads/customdata',
    ),
    calendar_name='NYSE', # US equities
    start_session=start_session,
    end_session=end_session
)

$ zipline ingest -b custom-csvdir-bundle Output:

/home/hemzy/virtualenv/jupyter_env/lib/python3.6/site-packages/zipline/__main__.py:60: UserWarning: Failed to load extension: 'extension.py.bkp'
No module named 'extension'
  os.environ,
Traceback (most recent call last):
  File "/home/hemzy/virtualenv/jupyter_env/bin/zipline", line 8, in <module>
    sys.exit(main())
  File "/home/hemzy/virtualenv/jupyter_env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/hemzy/virtualenv/jupyter_env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/hemzy/virtualenv/jupyter_env/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/hemzy/virtualenv/jupyter_env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/hemzy/virtualenv/jupyter_env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/hemzy/virtualenv/jupyter_env/lib/python3.6/site-packages/zipline/__main__.py", line 348, in ingest
    show_progress,
  File "/home/hemzy/virtualenv/jupyter_env/lib/python3.6/site-packages/zipline/data/bundles/core.py", line 400, in ingest
    end_session,
  File "/home/hemzy/virtualenv/jupyter_env/lib/python3.6/site-packages/zipline/data/us_equity_pricing.py", line 201, in __init__
    "Start session %s is invalid!" % start_session
ValueError: Start session 2016-01-01 00:00:00+00:00 is invalid!

...

-Raul

rsesteves commented 4 years ago

I am the same person. I have figured out the issue. Apparently, the listdir() is wrong. It is supposed to be /home/user/<path to csv files>. Another error may present itself in the bundle python file. On line 75, the formati must be changed to format. It is an apparent typo from the Trading Evolved book. This will be helpful for those using the book.