Open veeenu opened 6 years ago
Hi @veeenu - apologies for the confusion. I looked into this, and the description in #1778 is misleading. Looks like we may have started with a different intention, but in the change we actually merged, the daily bar writer expects no gaps in the data (i.e. they won't be filled).
If you expect gaps, you probably just want to reindex against the expected trading sessions to fill with nans. You should be able to do something like:
from zipline.utils.calendars import get_calendar
# Ensure the df is indexed by UTC timestamps
df = df.set_index(df.index.to_datetime().tz_localize('UTC'))
# Get all expected trading sessions in this range and reindex.
sessions = get_calendar('NYSE').sessions_in_range(start_date, end_date)
df = df.reindex(sessions)
No problem! I will try reindexing the dataframe. I suggest adding this bit of information to the documentation as I believe time series with gaps are a frequent use case, at least in the context of custom bundles.
Thanks for your patience, and keep up the great work! :)
After fixing the above, I incurred in another issue which I can't solve. It seems now that the data is correctly ingested, but I get an error at the time of executing the algorithm. This is my new ingest function:
def ingest(environ, asset_db_writer, minute_bar_writer, daily_bar_writer, adjustment_writer, calendar, start_session, end_session, cache, show_progress, output_dir):
differences = dict()
symbols = sorted([a['ticker'] for a in assets])
dtype = [('start_date', 'datetime64[ns]'),
('end_date', 'datetime64[ns]'),
('auto_close_date', 'datetime64[ns]'),
('symbol', 'object')]
metadata = pd.DataFrame(np.empty(len(symbols), dtype=dtype))
def write_fn():
for idx, asset in enumerate(assets):
aid, tkr = asset['id'], asset['ticker']
ts = requests.get(API('asset/{:s}/prices/eod'.format(aid))).json()
df = pd.DataFrame(ts).set_index('time')[['open', 'high', 'low', 'close', 'volume']]
df.index = pd.to_datetime(df.index).tz_localize('UTC')
start_date = df.index[0]
end_date = df.index[-1]
metadata.iloc[idx] = start_date.tz_convert(None), end_date.tz_convert(None), (end_date + pd.Timedelta(days=1)).tz_convert(None), tkr
sess = calendar.sessions_in_range(start_date, end_date)
dif = sess.difference(df.index)
if len(dif) > 0:
differences[tkr] = dif
df = df.reindex(sess)
yield idx, df
daily_bar_writer.write(write_fn(), show_progress=True)
asset_db_writer.write(equities=metadata)
adjustment_writer.write(
dividends=pd.DataFrame(columns=['sid', 'amount', 'ex_date', 'record_date', 'declared_date', 'pay_date']),
splits=pd.DataFrame(columns=['sid', 'ratio', 'effective_date']))
metadata['exchange'] = 'REINDEER'
for k, v in differences.items():
print(k, ' -> ', v) # list gaps
I then wrote a dummy algorithm, which works as intended with Quandl data:
from zipline.api import order, record, symbol
def initialize(context):
print(context)
def handle_data(context, data):
print(data)
But, as soon as I switch to my bundle, I get:
$ zipline run -b spx-reindeer -f algo.py -s 2018-01-01 -e 2018-02-01
[2018-05-30 08:51:18.524235] WARNING: Loader: Refusing to download new benchmark data because a download succeeded at 2018-05-30 07:59:42.414161+00:00.
[2018-05-30 08:51:18.549750] WARNING: Loader: Refusing to download new treasury data because a download succeeded at 2018-05-30 07:59:47.815400+00:00.
Traceback (most recent call last):
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\Scripts\zipline-script.py", line 11, in <module>
load_entry_point('zipline==1.2.0', 'console_scripts', 'zipline')()
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\click\core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\click\core.py", line 697, in main
rv = self.invoke(ctx)
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\click\core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\click\core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\click\core.py", line 535, in invoke
return callback(*args, **kwargs)
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\zipline\__main__.py", line 98, in _
return f(*args, **kwargs)
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\click\decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\zipline\__main__.py", line 259, in run
environ=os.environ,
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\zipline\utils\run_algo.py", line 208, in _run
overwrite_sim_params=False,
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\zipline\algorithm.py", line 642, in run
self.trading_environment.asset_finder.sids
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\zipline\assets\assets.py", line 494, in retrieve_all
update_hits(self.retrieve_equities(type_to_assets.pop('equity', ())))
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\zipline\assets\assets.py", line 528, in retrieve_equities
return self._retrieve_assets(sids, self.equities, Equity)
File "C:\Users\avenuta\AppData\Local\Continuum\Anaconda3\envs\zipline\lib\site-packages\zipline\assets\assets.py", line 681, in _retrieve_assets
asset = asset_type(**filter_kwargs(row))
File "zipline\assets\_assets.pyx", line 59, in zipline.assets._assets.Asset.__init__ (zipline/assets\_assets.c:1857)
TypeError: __init__() takes at least 2 positional arguments (1 given)
I'm not sure how to debug this situation as it looks pretty deep in the code, and related to the way the bundle is constructed. I also tried to add empty adjustment dataframes but at this point I can find no significant difference in the calls between my ingest function and the csvdir
bundle (which I used as a guideline for my function). Do you have any suggestions?
Thanks!
It looks like you're setting metadata['exchange']
after passing metadata
into asset_db_writer.write()
, so my guess is that the data being written is missing an exchange column. I'd try setting that beforehand.
For the initial report, sounds like the only issue is some details missing in the documentation, so I'm going to update to title of this reflect that. Feel free to open another issue if you run into anything else!
How would you solve for extra sessions? I have a similar problem, but iI have 2 errors. 1 one missing date, and the other shows extra sessions. How can I can ingest data and ignoring these extra sessions? and ingest such that it takes all the available data?
Dear Zipline Maintainers,
Before I tell you about my issue, let me describe my environment:
Environment
Now that you know a little about me, let me tell you about the issue I am having:
Description of Issue
zipline ingest -b mybundle
I receive this error:and the process halts. The assertion is correct, but the data (missing that session) is also correct. I see it should also be handled properly as of this pull request: https://github.com/quantopian/zipline/pull/1778
Here is how you can reproduce this issue on your machine:
Reproduction Steps
This is the ingest function I built:
and this is the
register()
call:What steps have you taken to resolve this already?
I tried looking into Zipline's source code and through the issues/pull requests to find out whether I made a mistake in my implementation but couldn't find anything. Thanks for your help, let me know if you need further information.
Sincerely, Andrea Venuta