quantopian / zipline

Zipline, a Pythonic Algorithmic Trading Library
https://www.zipline.io
Apache License 2.0
17.69k stars 4.73k forks source link

Ingest custom data as CSV fails with "KeyError: "['volume' 'open' 'high' 'close' 'low'] not in index" #2568

Open bhushanjawle opened 5 years ago

bhushanjawle commented 5 years ago

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

MacOs

Description of Issue

While ingesting custom data from CSV in required format using zipline ingest -b nse_stocks

Traceback (most recent call last): File "/Applications/anaconda3/envs/env_zipline/bin/zipline", line 11, in load_entry_point('zipline==1.3.0', 'console_scripts', 'zipline')() File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/click/core.py", line 722, in call return self.main(args, kwargs) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/click/core.py", line 535, in invoke return callback(args, **kwargs) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/main.py", line 348, in ingest show_progress, File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/bundles/core.py", line 451, in ingest pth.data_path([name, timestr], environ=environ), File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/bundles/csvdir.py", line 94, in ingest self.csvdir) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/bundles/csvdir.py", line 156, in csvdir_bundle show_progress=show_progress) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/us_equity_pricing.py", line 257, in write return self._write_internal(it, assets) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/us_equity_pricing.py", line 319, in _write_int ernal for asset_id, table in iterator: File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/click/_termui_impl.py", line 259, in next rv = next(self.iter) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/us_equity_pricing.py", line 249, in for sid, df in data File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/us_equity_pricing.py", line 414, in to_ctable

we already have a ctable so do nothing

File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/us_equity_pricing.py", line 417, in to_ctable winsorise_uint32(raw_data, invalid_data_behavior, 'volume', *OHLC) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/us_equity_pricing.py", line 121, in winsorise_uint32

File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/us_equity_pricing.py", line 137, in winsorise_uint32 mask = df[columns] > UINT32_MAX File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/pandas/core/frame.py", line 2133, in getitem return self._getitem_array(key) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/pandas/core/frame.py", line 2177, in _getitem_array indexer = self.loc._convert_to_indexer(key, axis=1) File "/Applications/anaconda3/envs/env_zipline/lib/python3.5/site-packages/pandas/core/indexing.py", line 1269, in _convert_to_indexer .format(mask=objarr[mask])) KeyError: "['volume' 'low' 'open' 'high' 'close'] not in index"

Here is how you can reproduce this issue on your machine:

## Reproduction Steps

Repeat the above command

## What steps have you taken to resolve this already?

- Checked similar [issues raised](https://github.com/quantopian/zipline/issues/2162) 

# Anything else?
CSV sample, 

`date,open,high,low,close,volume,dividend,split 2018-04-02,19400.2500,20199.0000,19400.2500,19973.9000,2563,0.0,1.0 2018-04-03,20182.0500,20300.1500,19921.0000,20111.3000,1118,0.0,1.0 2018-04-04,20111.3000,21000.0500,19851.0000,20105.0000,3744,0.0,1.0 2018-04-05,20459.0000,20720.0000,20340.0000,20638.7000,874,0.0,1.0 2018-04-06,20672.0500,20812.0000,20600.0000,20715.8000,772,0.0,1.0 2018-04-09,20999.0000,21000.0000,20286.0500,20484.5000,3109,0.0,1.0 2018-04-10,20310.0000,20750.0000,20310.0000,20663.4500,571,0.0,1.0 2018-04-11,20614.5500,20850.0000,20175.0000,20356.2500,2503,0.0,1.0 2018-04-12,20595.0000,20800.0000,20300.0000,20720.2500,1199,0.0,1.0`

Excerpt from ~/.zipline/extension.py as follows:

start_session = pd.Timestamp('2018-4-2', tz='utc') end_session = pd.Timestamp('2019-10-17', tz='utc') register( 'nse_stocks', csvdir_equities( ['daily'], 'path-to-csv-files-directory', ), calendar_name='XBOM', # IN equities start_session=start_session, end_session=end_session)



Sincerely,
`$ whoami`
ghost commented 5 years ago

Hi Bhushan, All looks okay, I was able to load your data after adding '\n' character at the end date,open,high,low,close,volume,dividend,split 2018-04-02,19400.2500,20199.0000,19400.2500,19973.9000,2563,0.0,1.0 2018-04-03,20182.0500,20300.1500,19921.0000,20111.3000,1118,0.0,1.0 2018-04-04,20111.3000,21000.0500,19851.0000,20105.0000,3744,0.0,1.0 2018-04-05,20459.0000,20720.0000,20340.0000,20638.7000,874,0.0,1.0 2018-04-06,20672.0500,20812.0000,20600.0000,20715.8000,772,0.0,1.0 2018-04-09,20999.0000,21000.0000,20286.0500,20484.5000,3109,0.0,1.0 2018-04-10,20310.0000,20750.0000,20310.0000,20663.4500,571,0.0,1.0 2018-04-11,20614.5500,20850.0000,20175.0000,20356.2500,2503,0.0,1.0 2018-04-12,20595.0000,20800.0000,20300.0000,20720.2500,1199,0.0,1.0

bhushanjawle commented 4 years ago

Thank you @sivasunku . I am in process of cleaning up the data downloaded from a new source, will update the issue on completion of the process.

taliarhodes commented 4 years ago

@bhushanjawle Were you able to get this working with the cleaned data or do you still need assistance?

matiasthunder commented 4 years ago

Installed Conda, python 3.5 ,Zipline, I have a folder ready with list of equities(date, ohlcv) as @samatix put it. In that same folder, there is a benchmark csv file in same format(date, ohlcv) with symbol "CNX100". I can't seem to run the following. Need to work with custom calendar,for now, could work with "XBOM" and modify it along the way to update custom holidays. Tried ingesting these csvs into bundle. Couldn't do that, so I created a panel.

Have been searching for a week now. before troubling you guys here. Please let me know if you need any further information to solve this. I'll be anything but happy to help, its a mess here.

Getting this error:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

import pandas as pd from collections import OrderedDict import pytz data = OrderedDict() df=pd.read_csv(r'C:\Users\user\Code\StockList - test.csv') path='C:\Users\user\Code\StockData\daily\'

for i in range(len(df)): data[df.loc[i, "Symbol"]] = pd.read_csv(path+df.loc[i, "Symbol"]+'.csv', index_col=0, parse_dates=['date']) data[df.loc[i, "Symbol"]] = data[df.loc[i, "Symbol"]][["open","high","low","close","volume"]]

panel = pd.Panel(data) panel.minor_axis = ["open","high","low","close","volume"] panel.major_axis = panel.major_axis.tz_localize(pytz.utc) print(panel) from zipline.api import (order, record, symbol,set_benchmark,order_target_percent, get_open_orders) import zipline

import matplotlib.pyplot as plt

from datetime import datetime

def initialize(context): set_benchmark(symbol("CNX100"))

def handle_data(context, data): order(symbol("ACC"), 10) record(ACC=data.current(symbol('ACC'), 'price'))

perf = zipline.run_algorithm(start=datetime(2017, 1, 5, 0, 0, 0, 0, pytz.utc), end=datetime(2018, 3, 1, 0, 0, 0, 0, pytz.utc), initialize=initialize, capital_base=10000, handle_data=handle_data, data=panel)

bhushanjawle commented 4 years ago

@taliarhodes I was unable to resolve this hence went ahead with different backtesting engine as it was open for long time.

matiasthunder commented 4 years ago

Hi @bhushanjawle Could you please let me know the setup of it, may be it'd fulfill my purpose here.

bhushanjawle commented 4 years ago

Just saw this . I was unable to resolve the issue. Used an alternative library to backtest.

On Thu, 21 May 2020, 15:38 matiasthunder, notifications@github.com wrote:

Hi @bhushanjawle https://github.com/bhushanjawle Could you please let me know the setup of it, may be it'd fulfill my purpose here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/quantopian/zipline/issues/2568#issuecomment-632001794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJZNDNDAKOSKWHHMBFRARDRST4SBANCNFSM4JD4KRKQ .

shaktisd commented 3 months ago

Just saw this . I was unable to resolve the issue. Used an alternative library to backtest. On Thu, 21 May 2020, 15:38 matiasthunder, @.***> wrote: Hi @bhushanjawle https://github.com/bhushanjawle Could you please let me know the setup of it, may be it'd fulfill my purpose here. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2568 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJZNDNDAKOSKWHHMBFRARDRST4SBANCNFSM4JD4KRKQ . As of now it is easy to use zipline, simple conda install works out of box. Do you mind sharing which alternative you used.