Open twiecki opened 9 years ago
@twiecki can you post an example of an algo that triggers this?
https://groups.google.com/forum/#!topic/zipline/1RiEgZEXyI0, the third post.
Posting slightly modified script here (still requires the csv files):
import datetime
import pytz
import numpy as np
import pandas as pd
#import sklearn
#import scikits
import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas.io.data
from scipy import stats
import zipline as zp
from zipline import TradingAlgorithm
from zipline.api import *
from zipline.finance.slippage import FixedSlippage
from zipline.transforms import batch_transform
from zipline.api import order_target, record, symbol, history, add_history
import math
from pytz import timezone
from zipline.utils import tradingcalendar as calendar
df1 = pd.read_csv("GBPUSD1440.csv", names=['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume'],
index_col='Date_Time', parse_dates=[[0, 1]])
df2 = pd.read_csv("EURUSD1440.csv", names=['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume'],
index_col='Date_Time', parse_dates=[[0, 1]])
df1['open'] = df1['Open']
df1['high'] = df1['High']
df1['low'] = df1['Low']
df1['close'] = df1['Close']
df1['volume'] = df1['Volume']
df1['price'] = df1['Close']
df1 = df1.dropna()
df1 = df1.drop('Open', 1)
df1 = df1.drop('High', 1)
df1 = df1.drop('Close', 1)
df1 = df1.drop('Volume', 1)
df1 = df1.drop('Low', 1)
df1['open1'] = df2['Open']
df1['high1'] = df2['High']
df1['low1'] = df2['Low']
df1['close1'] = df2['Close']
df1['volume1'] = df2['Volume']
df1['price1'] = df2['Close']
df1 = df1.dropna()
df2['open'] = df1['open1']
df2['high'] = df1['high1']
df2['low'] = df1['low1']
df2['close'] = df1['close1']
df2['volume'] = df1['volume1']
df2['price'] = df1['price1']
df2 = df2.dropna()
df2 = df2.drop('Open', 1)
df2 = df2.drop('High', 1)
df2 = df2.drop('Close', 1)
df2 = df2.drop('Volume', 1)
df2 = df2.drop('Low', 1)
df1 = df1.drop('open1', 1)
df1 = df1.drop('high1', 1)
df1 = df1.drop('low1', 1)
df1 = df1.drop('close1', 1)
df1 = df1.drop('volume1', 1)
df1 = df1.drop('price1', 1)
df1 = df1.tz_localize('UTC')
#df1 = df1.tz_convert('US/Eastern')
df2 = df2.tz_localize('UTC')
#df2 = df2.tz_convert('US/Eastern')
data = pd.Panel({'GBP' : df1, 'EUR' : df2})
data = data.dropna()
data
#plt.gcf().set_size_inches(16, 12)
#context.gld = symbol('GBP')
#context.iau = symbol('EUR')
#add_history(50, '1d', 'price')
def initialize(context):
context.sid1 = symbol('GBP') #Chevron
context.sid2 = symbol('EUR') #Exxon Mobil
context.lookbackPeriod = 30
context.channelWidth = 2.0
context.bet_size = 200
add_history(30, '1d', 'price')
context.i = 0
# Will be called on every trade event for the securities you specify.
def handle_data(context, data):
context.i += 1
if context.i < 40:
return
price_history = history(bar_count=30, frequency='1d', field='price')
ratio = price_history[context.sid1] / price_history[context.sid2]
ratioSTD = ratio.std()
ratioMean = ratio.mean()
upper = ratioMean + context.channelWidth * ratioSTD
lower = ratioMean - context.channelWidth * ratioSTD
ratioToday = data[context.sid1].price/data[context.sid2].price
record(upper=upper, middle=ratioMean, lower=lower, ratio=ratioToday)
if ratioToday > upper:
x = price_history[context.sid1]
y = price_history[context.sid2]
theta = sm.OLS(y, x).fit().params[context.sid1]
long_bet = data[context.sid2].price * context.bet_size
short_bet = -1*theta * data[context.sid1].price * context.bet_size
# long sid2
order_target_value(context.sid2, long_bet)
# short sid1
order_target_value(context.sid1, short_bet)
elif ratioToday < lower:
x = price_history[context.sid2]
y = price_history[context.sid1]
theta = sm.OLS(y, x).fit().params[context.sid2]
long_bet = data[context.sid1].price * context.bet_size
short_bet = -1*theta * data[context.sid2].price * context.bet_size
# long sid1
order_target_value(context.sid1, long_bet)
# short sid2
order_target_value(context.sid2, short_bet)
context.inPosition = True
algo_obj = TradingAlgorithm(initialize=initialize,
handle_data=handle_data)
# Run algorithm
perf_manual = algo_obj.run(data)
#ax1 = plt.subplot(311)
#perf_manual.portfolio_value.plot(ax=ax1)
#ax1.set_ylabel('portfolio value in $')
I ran this locally and have only done some light poking but wanted to post my initial thoughts. I think this is an issue where we are making assumptions that there will not be missing days in daily mode, and instead these will be rows full of nans. The data source that is given has a couple of days missing here and there. More investigation needs to go into the interaction between the dataframe source and history to be certain.
The algorithm is the classic dual_moving_avg with a twist: 0) copy these files to ~/.zipline/cache/ https://drive.google.com/file/d/0B-dfTbup1rFdTWM2cHo1YmFqS00/view?usp=sharing 1) load this data start = datetime.datetime(2011, 9, 9, 9, 30, 0, 0, pytz.utc) end = datetime.datetime(2011, 11, 21, 15, 59, 0, 0, pytz.utc) 2) replace all occurrences of '1d' with '1m' 3) Initialize the algorithm with minute frequency: algo_obj = zipline.algorithm.TradingAlgorithm(data_frequency='minute', initialize=initialize, handle_data=handle_data) 4) run it perf = algo_obj.run(data)
I tried running zipline
yesterday and I ran into the same error when I added a moving average transform to my minute data. Wasn't sure if it was a problem with my setup since it was my first time running it and looking at the code.
http://nbviewer.ipython.org/gist/dalejung/1ab100b08cbb2dfe0877 is relatively self contained
@dalejung : using transform is being phased out but I think your test case can help debug this issue. What I did is buffer more history, like if the documented moving average algo says: context.i += 1 if context.i < 300: return I changed it to 400, the downside is that I'm losing minutes of trade time in that way. Also if the data is intraday your fine, but ifyour data spans more than a day you need to reset the context every day. It's sad, I know.
Transform was recently refactored to use history
so it's not being deprecated.
For the minute stuff, this happens when the datasource has data outside the trading window. The cur_window_starts
can only be market minutes, so when get_history
tries to add buffer data it grabs an empty frame since earliest_minute
is after algo_dt
.
Not sure what the expected behavior should be. Would perhaps make sense to put a guard to block trade data events that are out of the environment's market hours.
I think this is an issue where we are making assumptions that there will not be missing days in daily mode, and instead these will be rows full of nans.
@llllllllll if this is the case then the issue should be fixed by pre-computing the expected index (which, conveniently, has already been done in TradingEnvironment
) and then doing a reindex on the input data, right?
Is someone actively working on this issue? I am trading on the Swedish market and run into this issue for most of the stocks that I trade with. Are there any good alternatives to using the history function as a work around?
I investigated some more and the code below should show the problem:
from zipline.api import order_target, history, add_history
from zipline.utils.factory import load_from_yahoo
from zipline.algorithm import TradingAlgorithm
def initialize(context):
add_history(10, '1d', 'price')
context.i = 0
def handle_data(context, data):
context.i += 1
# this function will crash on some
prices = history(10, '1d', 'price')
order_target(context.security, 1000)
if __name__ == '__main__':
# run the algorithm on these securities one by one
securities = [
'SKF-B.ST', # swedish company that does not work
'VOLV-A.ST', # swedish company that does not work
'AZN.ST', # swedish company that does not work
'AAPL', # US company works
'TSLA', # US company works
]
for security in securities:
# get data from yahoo
data = load_from_yahoo(stocks=[security], indexes={}, start='20140101', end='20140501')
# create and run algorithm
algo = TradingAlgorithm(
initialize=initialize,
handle_data=handle_data)
algo.security = security
try:
results = algo.run(data)
print('OK running algorithm on security.: {}'.format(security))
except IndexError:
print('Could not run algorithm on security: {}'.format(security))
Same issue on the german market (CET Timezone). Any ideas?
My advice is lame, sorry. Either dump zipline or debug yourself. I took the second path which turned to a long path joining the first..... On Jul 18, 2015 10:24 AM, "Tobias Schlottke" notifications@github.com wrote:
Same issue on the german market (CET Timezone). Any ideas?
— Reply to this email directly or view it on GitHub https://github.com/quantopian/zipline/issues/447#issuecomment-122508219.
and what did you end up with?
Hey Thomas - I think I have the cause of this issue identified (or at least one time when this is being hit).
Whenever I have a data file which includes data on a day which zipline considers to not be a trading day, this error is flagged when it hits the ffill_buffer_from_prior_values() function.
To reproduce: just generate a csv file from a yahoo retrieval, and add a line in the datafile for MLK day or some other holiday. Make sure you add it after a point after which the history would have warmed up (else you won't see the issue). That should generate the error.
This also correlates to those who are having trouble with futures data inputs like me and foreign data inputs (sebnil & tobsch, noted above).
Knowing the cause, I'm hopeful you will know where to resolve, and/or identify a workaround. Perhaps we will need to change the trading calendar to fix this (or have it ignore the calendar).
Best, Ken
Hi,
that sounds reasonable. Why don't you integrate a library like this one and let the user define the region?
https://github.com/novapost/workalendar
Best,
Tobias
Interesting package - I wonder if "working day" == "trading day"... likely. But, then we need to be able to define a superset of the calendars if you combine for example different country data in the same portfolio. (Then the ffills would fix the data.)
Thus, my hope is that there is a mechanism that Thomas can readily identify for the user being able to redefine the current trading calendar as the index of our pandas Panel (portfolio), which will be a superset of all the items (securities). If so, that would be proper operation.
A simulator with one identical method handle_data Except for that it is totally different. On Jul 18, 2015 10:55 AM, "Tobias Schlottke" notifications@github.com wrote:
and what did you end up with?
— Reply to this email directly or view it on GitHub https://github.com/quantopian/zipline/issues/447#issuecomment-122510053.
I solved it by just not using the history function at all. Why does zipline have so much complexity for getting historical data when it can be done using pandas anyway? (not a criticism, but a genuine question)
I did something like this:
historical_data = better_history.get_history(context)
And then a new file with a simple function:
def get_history(context):
try:
return context.data[:context.datetime]
except AttributeError:
logging.error('context.data is not set. Make sure to include it in context variable.')
raise
Not sure what happened with the crazy re-assignments.
Anyone who could take a look at this? @brianpfink maybe? CC @ehebert @ssanderson @jfkirk
One work around is to not trade days that the trading calendar thinks are non-trading days.
i.e. def handle_data(algo, data): if not algo.trading_environment.is_trading_day(algo.get_datetime().date()): return
Rest of your algo....
The other is seems to a custom trading calendar, if someone has an easy way to derive a trading calendar from a panel I would prefer that because clearly this solution I gave skips real trading days which are in the history data, this is obviously more true for people not using US trading days..
There are a couple of emails on the zipline mailing list that are reporting problems with history. I am able to reproduce on master and get this: