quantopian / zipline

Zipline, a Pythonic Algorithmic Trading Library
https://www.zipline.io
Apache License 2.0
17.28k stars 4.67k forks source link

history() for opening price #374

Closed heyuhere closed 8 years ago

heyuhere commented 9 years ago

First, thank you for open-sourcing this nice library. Question about history(). It does not seem to support 'open_price' as field. When it is passed, it throws KeyError. For 'open', I get just the last value for a given period, leaving the rest as NaN.

I wonder if this behavior is expected. If so, do you have a plan to support open_price like the other fields?

heyuhere commented 9 years ago

This is the stack dump for 'open_price'.

Traceback (most recent call last): File "scripts/run_algo.py", line 24, in run_pipeline(print_algo=True, parsed) File "/Users/juwlee/anaconda/lib/python2.7/site-packages/zipline-0.6.1-py2.7.egg/zipline/utils/cli.py", line 192, in run_pipeline perf = algo.run(source) File "/Users/juwlee/anaconda/lib/python2.7/site-packages/zipline-0.6.1-py2.7.egg/zipline/algorithm.py", line 425, in run for perf in self.gen: File "/Users/juwlee/anaconda/lib/python2.7/site-packages/zipline-0.6.1-py2.7.egg/zipline/gens/tradesimulation.py", line 155, in transform self.algo.handle_data(self.current_data) File "/Users/juwlee/anaconda/lib/python2.7/site-packages/zipline-0.6.1-py2.7.egg/zipline/algorithm.py", line 226, in handle_data self.history_container.update(data, self.datetime) File "/Users/juwlee/anaconda/lib/python2.7/site-packages/zipline-0.6.1-py2.7.egg/zipline/history/history_container.py", line 313, in update for sid, bar in data.iteritems() File "/Users/juwlee/anaconda/lib/python2.7/site-packages/zipline-0.6.1-py2.7.egg/zipline/history/history_container.py", line 320, in sid in self.buffer_panel.minor_axis)}) File "/Users/juwlee/anaconda/lib/python2.7/site-packages/zipline-0.6.1-py2.7.egg/zipline/history/history_container.py", line 312, in {sid: {field: bar[field] for field in fields} File "/Users/juwlee/anaconda/lib/python2.7/site-packages/zipline-0.6.1-py2.7.egg/zipline/protocol.py", line 135, in getitem return self.dict**[name] KeyError: 'open_price'

ssanderson commented 9 years ago

Hi @heyuhere. Thanks for using zipline!

The valid fields for history are 'open', 'close', 'high', 'low', 'volume', 'price', and 'close_price'. 'price' and 'close_price' histories contain the same information, except that specifying close_price disables forward-filling.

Specifying 'open' should return the first traded value of each security during the specified period. Can you share an example of input where you're seeing NaNs in your open history?

ssanderson commented 9 years ago

It's also worth noting that you'll always see NaNs in history in the opening bar of your simulation. For example, if you have a 10 day history and you've only simulated one day, you haven't seen data for the previous 9 days, and hence we can't populate the history container with entries for those dates.

heyuhere commented 9 years ago

@ssanderson thanks for quick response. Yes, NaNs are expected at the first few iterations. However, history(field='open') always has the last value only while the other fields (e.g., 'high') are fully filled.

Test code:

from zipline.api import symbol, history, add_history
import sys

trend_period = 5

def initialize(context):
    context.count = 0
    add_history(bar_count=trend_period, frequency='1d', field='open')
    add_history(bar_count=trend_period, frequency='1d', field='high')
    add_history(bar_count=trend_period, frequency='1d', field='low')
    add_history(bar_count=trend_period, frequency='1d', field='price')

def handle_data(context, data):
    context.count += 1

    o = _get_history(field='open')
    h = _get_history(field='high')
    l = _get_history(field='low')
    c = _get_history(field='price')
    ohlc = o.join(h, lsuffix='_o', rsuffix='_h').join(l, rsuffix='_l').join(c, rsuffix='_p')
    if context.count > 100:
        print context.count, ohlc

def _get_history(field='price'):
    return history(bar_count=trend_period, frequency='1d', field=field)

sample outpu (please note *_o columns): 208 dia_o qqq_o dia_h qqq_h \ 2006-10-23 00:00:00+00:00 NaN NaN 99.553687 39.900561
2006-10-24 00:00:00+00:00 NaN NaN 99.652875 39.780505
2006-10-25 00:00:00+00:00 NaN NaN 99.778219 39.844337
2006-10-26 00:00:00+00:00 NaN NaN 99.898625 40.213695
2006-10-27 00:00:00+00:00 99.650525 40.027861 99.782023 40.121559
dia qqq dia_p qqq_p
2006-10-23 00:00:00+00:00 98.378407 39.188386 99.34 39.76
2006-10-24 00:00:00+00:00 99.241935 39.283835 99.62 39.49
2006-10-25 00:00:00+00:00 99.194673 39.403912 99.77 39.76
2006-10-26 00:00:00+00:00 99.315095 39.557833 99.80 40.12
2006-10-27 00:00:00+00:00 99.140972 39.428192 99.33 39.55

ssanderson commented 9 years ago

Are you just using Yahoo finance data for this algo, or are you using your own data source? History isn't currently officially supported when running with daily-frequency data, but we recently committed some changes to unofficially support that use case. It's possible there's a bug there that's only affecting open.

heyuhere commented 9 years ago

I am downloading data from Yahoo finance. Just to summarize:

BarData({'qqq': SIDData({'high': 38.798322440087148, 'open': 38.012817719680463, 'price': 38.63, 'volume': 109350400, 'low': 37.554606632776569, 'sid': 'qqq', 'source_id': 'DataPanelSource-aaf24937f63422adedab5312c712d878', 'close': 38.630000000000003, 'dt': Timestamp('2006-01-03 00:00:00+0000', tz='UTC'), 'type': 4}), 'dia': SIDData({'high': 87.745902335456478, 'open': 86.840638788885812, 'price': 87.56, 'volume': 9761900, 'low': 86.274849072279139, 'sid': 'dia', 'source_id': 'DataPanelSource-aaf24937f63422adedab5312c712d878', 'close': 87.560000000000002, 'dt': Timestamp('2006-01-03 00:00:00+0000', tz='UTC'), 'type': 4})})

I am not sure which one is right, though: open or open_price. zipline seems to use 'open' at some places and 'open_price' at others. Some comments in history code is also using 'open' although documentation suggests 'open_price'. It would be nice if they are consolidated.

heyuhere commented 9 years ago

The first issue was resolved by replacing 'open' with 'open_price' in here and here. So it looks field name consolidation would help.

However, I feel the field name should be decided first as mentioned above. Plus, 'open_price' is already documented on Quantopian as well as on github while 'open' is widely used in zipline code. I will leave the decision up to the code owners but hope this gives enough information to fix the problem.

anddam commented 9 years ago

@ssanderson about this comment why not provide past history if it's available in datasource?

Are there any plan to level this on par with Quantopian where the past data is available even on first handle_data execution? I.e. is it implemented this way in zipline on purpose or has it just not been worked on yet?

This would make easier to port an algorithm from zipline to Quantopian since there's no need to prefill data.

DMTSource commented 9 years ago

Is there a work around for this? I am working locally with zipline to optimize variables for my trading algos but I cannot get the open_price, or close_price of the SPY symbol when working with history(). I need the entire bar to recreate my work from Quantopian.

llllllllll commented 9 years ago

@DMTSource, there is not currently a workaround other than warming up your algorithm by setting the start date to the beginning of your history window and skipping 1 history window worth of bars before executing your algorithm. Just a note, this will only work if you pre-declare your history spec with an add_history call. @anddam , This is something that we agree would be nice, and you are correct that this data should be available to backfill your history container; however, we have not had time to add this feature.

llllllllll commented 8 years ago

We are tracking the backfill feature in #956