Closed ghost closed 8 years ago
Data update is done.
The back adjustment script is on my to do list already (at #2 if you're interested). It's not a quick thing to write unfortunately so it could be a while before I get round to it.
Great, thanks for the data update!
I'll patiently look out for the back-adjustment script.
Thanks
+1 for back-adjustment. Does anyone have a panama adjustment script (no matter how rough) to get me started? Would be awesome have something to use as a base for coding up a back-adjustment. Ideally i'd like to be able to create the provided data sets from script.
This is what I'm using, I store my prices in my SQL Database.
A couple of proviso's: 1) Im still learning python so this may be pretty poor quality!!! 2) Because I store my prices in SQL, I have a table (RollDate) which has a list of contract, roll dates, switch from contract and switch to contract. I get these dates from Rob's csv files. The long SQL query basically gives me a table of data with roll dates and the price adjustment factor. 3) At the moment I'm still building up my stitched price series, so I'm going one by one incase I find a bug, the last one I worked on was S (SoyBeans)
`import pyodbc import pandas from datetime import datetime import matplotlib.pyplot as plt import numpy
from GlobalVariables import MyGlobalVariables
def CreateStitchdPriceStream():
plt.interactive(False)
print("Importing Libraries")
cnxn = pyodbc.connect(MyGlobalVariables.ConnStr)
sql = "select Symbol from FuturesSecurityMaster where Trading = 1 and Symbol = 'S'"
TradingFutures = pandas.read_sql(sql, cnxn)
print("Getting Live Symbols")
for index, row in TradingFutures.iterrows():
sql = "select symbol, Date, price from StitchedFuturesPrices where symbol = ?"
print(row['Symbol'])
symbol = row['Symbol']
StitchedPrices = pandas.read_sql(sql, cnxn, params=[row['Symbol']])
print("There are %s stitched prices" % len(StitchedPrices))
sql = """select Symbol,
case when StitchingDate >= MaxStoredDate then StitchingDate else MaxStoredDate end as StitchingDate, SwitchToContract, SwitchFromContract, Differential, NextStitchingDate, LastRow
from
(select Symbol, StitchingDate,
IsNull((select max(Date) from StitchedFuturesPrices SFP where SFP.Symbol = RD.Symbol),DATEFROMPARTS(1800,1,1)) as MaxStoredDate,
SwitchToContract, SwitchFromContract, FPTo._Close - FPFrom._Close as Differential,
IsNull(LEAD(StitchingDate) OVER (ORDER BY StitchingDate),
(select max(date) from FuturesPrices FPM where FPM.ContractCode = SwitchToContract )) NextStitchingDate,
case ISNull(LEAD(FPTo._Close) OVER (ORDER BY StitchingDate),1) when 1 then 1
else 0 end as LastRow
from RollDate RD
Inner Join FuturesPrices FPTo on FPTo.Date = RD.StitchingDate and FPTo.ContractCode = RD.SwitchToContract
Inner Join FuturesPrices FPFrom on FPFrom.Date = RD.StitchingDate and FPFrom.ContractCode = RD.SwitchFromContract
where symbol = ?) as source1 where (NextStitchingDate >
IsNull((select max(Date) from StitchedFuturesPrices SFP where SFP.Symbol = source1.Symbol),DATEFROMPARTS(1800,1,1)))
"""
StitchDates = pandas.read_sql(sql, cnxn, params=[symbol])
print(StitchDates)
StitchDates.sort_values(by='StitchingDate', ascending=True, inplace=True)
row_iterator = StitchDates.iterrows()
for i, StitchDateRow in row_iterator:
StitchDate = StitchDateRow['StitchingDate']
previouscontract = StitchDateRow['SwitchFromContract']
nextdate = StitchDateRow['NextStitchingDate']
CurrentContract = StitchDateRow['SwitchToContract']
differential = StitchDateRow['Differential']
islastrow = StitchDateRow['LastRow']
if i == 0 and len(StitchedPrices) == 0:
sql = """select Date, _Close as price from FuturesPrices
where ContractCode = ? and DATE < ?
order by Date """
StitchedPrices = pandas.read_sql(sql, cnxn, params=[previouscontract, StitchDate])
print("%s stitched prices now" % len(StitchedPrices))
sql = """select Date, _Close as price from FuturesPrices
where ContractCode = ? and DATE >= ? and Date < ?
order by Date """
PricesToAdd = pandas.read_sql(sql, cnxn, params=[CurrentContract, StitchDate, nextdate])
print("%s prices to add" % len(PricesToAdd))
print("Adjusting stitched prices by %s" % differential)
StitchedPrices['price'] = StitchedPrices['price'] + differential
StitchedPrices = pandas.concat([StitchedPrices, PricesToAdd])
print("%s stitched prices now" % len(StitchedPrices))
else:
# if its the end of the data I need to include the latest day
if(islastrow == 1):
sql = """select Date, _Close as price from FuturesPrices
where ContractCode = ? and Date >= ? and Date <= ?
order by Date """
else:
sql = """select Date, _Close as price from FuturesPrices
where ContractCode = ? and DATE >= ? and Date < ?
order by Date """
if(islastrow != 1 and len(StitchDates) != 1):
print("Adding %s to previous prices" % (StitchDateRow['Differential']))
StitchedPrices['price'] = StitchedPrices['price'] + differential
print("Getting prices for %s between %s and %s" % (CurrentContract, StitchDate, nextdate))
PricesToAdd = pandas.read_sql(sql, cnxn, params=[CurrentContract, StitchDate, nextdate])
print("%s prices to add" % len(PricesToAdd))
if (islastrow == 1 and len(StitchDates) == 1):
print("Removing stitched prices as we're just adding a new prices, no need to update the differential")
StitchedPrices = PricesToAdd
# Drop first row
StitchedPrices.drop(StitchedPrices.index[:1], inplace=True)
else:
StitchedPrices = pandas.concat([StitchedPrices, PricesToAdd])
print("%s stitched prices now" % len(StitchedPrices))
print(StitchedPrices.head())
print(StitchedPrices.tail())
print("Plotting")
StitchedPrices['Date'] = pandas.to_datetime(StitchedPrices['Date'])
plt.plot(numpy.arange(len(StitchedPrices)), StitchedPrices.price)
plt.show()
print("Storing stitched prices ...")
StoreStitchedPrices(StitchedPrices, symbol)
def StoreStitchedPrices(StitchedPrices, Symbol): cnxn = pyodbc.connect(MyGlobalVariables.ConnStr) deletecursor = cnxn.cursor() deletecursor.execute("delete from StitchedFuturesPrices where Symbol = ?", Symbol) print('Deleted {} old records products'.format(deletecursor.rowcount)) cnxn.commit()
cursor = cnxn.cursor()
pricesdownloaded = 0
for index, subrow in StitchedPrices.iterrows():
cursor.execute("""insert into StitchedFuturesPrices(Symbol, Date, Price) values (?, ?, ?)""",
Symbol, subrow['Date'], subrow['price'])
pricesdownloaded += 1
DataToStore = True
if DataToStore:
print("Committing %s prices!" % pricesdownloaded)
cnxn.commit()
`
Don't know if this is a good spot to share code. But anyway i profit a lot from sharing. So pick from this here as well if it is useful to you.
This is a more general, object oriented approach.
Have fun. It is quite new, so I am thankful if you let me know any bugs you come across.
` import datetime import pandas as pd from pandas.tseries.offsets import Day, BDay, DateOffset import numpy as np import os import pickle
class rollingFuturesContract:
def __init__(self, maturityMonths, rollMonthsBeforeMat, day2roll, adjBDay=1):
'''
Class for building rolling futures contracts
Class obbject carries all instructions for constructing a rolling contract. Actual construction for a specific symbol is done in get method.
Rolling always takes place on early morning of the roll date.
: maturityMonths : list of futures maturing moths -> [3,6,9,12]
: rollMonthsBeforeMat : how many months before a contract matures the rolling should take place -> 2
: day2roll : on which calender day of a month we should roll -> 15
: adjBDay : if day2roll is on weekend, move it to next Monday (adjBDay>0 or adjBDay=True), or to previous Friday (adjBDay<0)
: returns : new rollingFuturesContract object
'''
self.path = os.path.abspath(__file__)
self.dir_path = os.path.dirname(self.path)
self.params = self.getParameters()
self.stichDayMaxOffset = self.params['stichDayMaxOffset'] # max number of days before the roll date, on which presence of price data will be checked for stitching contracts together
self.rollingTSPath = self.params['rollingTSPath']
self.day2roll = day2roll
self.adjBDay = adjBDay
self.maturityMonths = maturityMonths
self.maturityMonths.sort()
self.rollMonthsList = [self.monthDiff(y, -rollMonthsBeforeMat) for y in self.maturityMonths] # list with months in which rolling takes place
self.maturityMonthsReordered = self.maturityMonths[1:] + self.maturityMonths[:1]
self.rollMonths = pd.DataFrame({'from' : self.maturityMonths, 'into' : self.maturityMonthsReordered}, index=self.rollMonthsList) # df with details of rolling cycle (general info, independent of actual start and end)
self.rollMonths.index.name='month'
self.rollMatrix =[] # df with actual roll details (roll dates, stitch dates, prices, price adjustment etc.)
self.symbol = None
self.datasource = None
self.start = None
self.end = None
self.method = None
self.ts = []
def get(self, symbol, datasource, start, end, method='panama_bk', **kwargs):
'''
Construct actual rolling futures contract for specific symbol
: symbol : instrument symbol -> 'T'
: datasource : source of price data. Currently only 'quandl'. Additional source dependent parameters can be put in **kwargs
: start : start date of time series -> '2000-01-01'
: end : end date of time series -> '2016-12-31'
: method : how rolling contract should be constructed. Currently only 'panama_bk'
: **kwargs : additional parameters required by datasource or method applied (f.e. database name if using datasource='quandl)
'''
self.symbol = symbol
self.datasource = datasource
self.start = start
self.end = end
self.method = method
self.kwargs = kwargs
if method == 'panama_bk':
self.construct_panamabk(symbol, datasource, start, end, **kwargs)
else:
print('could not understand method for constructing the rolling contract: ' + method + '. At the moment you can choose from \'panama_bk\'')
return
return self.ts.copy()
def construct_panamabk(self, symbol, datasource, start, end, **kwargs):
self.datasource = datasource
rm = self.doRollMatrix(start, end)
out=[]
self.rollMatrix = pd.DataFrame(index=np.arange(0, len(rm)), columns=('from', 'to', 'curContract', 'prevContract', 'nextContract', 'stitchDate', 'unadjCloseCurC', 'adjCloseNextC', 'priceAdj', 'priceAdjCum') )
# treat one contract after the other, coming from the latest to process
p_diff_cum = 0
i=len(rm)-1
while i >= 0:
getDateFrom = rm.ix[i,'from'] - self.stichDayMaxOffset * BDay() # load a little more data so we are sure we can do the stiching
getDateTo = rm.ix[i,'to'] + self.stichDayMaxOffset * BDay()
print('Processing maturity ' + rm.ix[i,'curContract']+': getting raw prices from '+getDateFrom.strftime('%Y-%m-%d')+' to '+getDateTo.strftime('%Y-%m-%d'))
cur = self.getRawPrices(symbol, rm.ix[i,'curContract'], datasource, getDateFrom, getDateTo, **kwargs)
# sticht the contracts together (do stitching only from the second round on)
if i<len(rm)-1:
stitchDate = rm.ix[i+1, 'from'] - DateOffset(days=1) # contract switches in the early morning, price offset is determined from close before rollDate
# if there was not data for the day before the roll date, try the previous stichDayMaxOffset consecutive days (maybe there was a weekend?)
found=False
earliestAllowedStitchDate = stitchDate - self.stichDayMaxOffset*Day()
while stitchDate >= earliestAllowedStitchDate:
try:
p_cur = cur.ix[stitchDate.strftime('%Y-%m-%d'), 'Close']
p_next = next.ix[stitchDate.strftime('%Y-%m-%d'), 'Close']
found=True
#print('Found a stitch date: '+stitchDate.strftime('%Y-%m-%d'))
break
except KeyError:
pass
stitchDate=stitchDate - Day()
if found is False:
raise ValueError('While trying to stich futures contracts: Could not get any overlapping data within allowed perios. Tried backwards until ' + str(self.stichDayMaxOffset) + ' days before roll date')
p_diff = p_next - p_cur # if positive number -> previous series needs to be elevated
p_diff_cum = p_diff_cum + p_diff
#print('Stitch info: stitch date: ' + stitchDate.strftime('%Y-%m-%d') + ', p_cur: ' + str(p_cur) + ', p_next: ' + str(p_next) + ', p_diff: '+ str(p_diff) + ', p_diff_cum: '+ str(p_diff_cum))
cur['Open'] = cur['Open'].apply(lambda x: x + p_diff)
cur['High'] = cur['High'].apply(lambda x: x + p_diff)
cur['Low'] = cur['Low'].apply(lambda x: x + p_diff)
cur['Close'] = cur['Close'].apply(lambda x: x + p_diff)
self.rollMatrix.loc[i] = [rm.ix[i,'from'].date(), rm.ix[i,'to'].date(), rm.ix[i,'curContract'], rm.ix[i,'prevContract'], rm.ix[i,'nextContract'], stitchDate.strftime('%Y-%m-%d'), p_cur, p_next, p_diff, p_diff_cum]
else:
# in first round set only 'None' to detail matrix
self.rollMatrix.iloc[i] = [rm.ix[i,'from'].date(), rm.ix[i,'to'].date(), rm.ix[i,'curContract'], rm.ix[i,'prevContract'], rm.ix[i,'nextContract'], None, None, None, 0, 0]
cur['curContract'] = rm.ix[i,'curContract']
cur['priceAdj'] = p_diff_cum
out.append(cur[rm.ix[i,'from'].strftime('%Y-%m-%d'):rm.ix[i,'to'].strftime('%Y-%m-%d')])
next=cur
i=i-1
out=list(reversed(out))
self.ts = pd.concat(out)
def doRollMatrix(self, start, end): # create df with roll periods (from, to, curContract, prevContract...)
'''
constructs a df with roll info (current previous and contract, from, to)
'''
rd = pd.to_datetime(start)
l=[rd]
e=pd.to_datetime(end)
while l[-1]<e:
l.append(self.nextRollDate(pd.to_datetime(l[-1])))
del l[-1]
df=pd.DataFrame({'from': l})
#add other columns
df['to']=df['from'].map(lambda x : self.nextRollDate(x) - Day()) # take previous day, not prevBusDay, since some contracts might be trading on the weekend
df['curContract']=df['from'].map(lambda x : self.currentContract(x))
df['prevContract']=df['from'].map(lambda x : self.previousContract(x))
df['nextContract']=df['from'].map(lambda x : self.nextContract(x))
#restrict end of last period to end date (set 'to' to end date)
if df.ix[len(df)-1,'to']>e: df.ix[len(df)-1,'to'] = e
return df
def getRawPrices(self, symbol, maturity, datasource, start, end, **kwargs):
if datasource=='quandl':
from pygruebi.datastore import dsQuandl
q=dsQuandl.dsQuandl(database=kwargs['database'], symbol=symbol, maturity=maturity)
df=q.get(start.strftime('%Y-%m-%d'), end.strftime('%Y-%m-%d'), **kwargs)
return df
else:
print('could not understand the datasource wanted for constructing the rolling contract: ' + datasource + '. At the moment you can choose from: \'quandl\'')
def monthDiff(self, month, diff): #calculates an offset of diff months to a given month number
o=month+diff
while o<=0: o+=12
return o
def rollDate_adjBDay(self, date):
'''
moves potential roll day of the particular year and month (given in 'date') out of a weekend
'''
d = pd.datetime(date.year, date.month, self.day2roll)
if self.adjBDay>0:
d = d - Day() + BDay() #move weekend to Monday
elif self.adjBDay<0:
d = d + Day() - BDay() # move weekend to Friday
return d
def previousRollMonth(self, date):
if date.month in self.rollMonths.index:
if date<self.rollDate_adjBDay(date):
return self.rollMonths.index[self.rollMonths.index.get_loc(date.month)-1]
else:
return date.month
else:
#print('date.month: '+str(date.month))
if date.month>=self.rollMonths.index[0]:
return self.rollMonths.index[self.rollMonths.index.get_loc(date.month, method='pad')]
else:
return self.rollMonths.index[-1]
def nextRollMonth(self, date):
if date.month in self.rollMonths.index:
if date>=self.rollDate_adjBDay(date):
try:
return self.rollMonths.index[self.rollMonths.index.get_loc(date.month)+1]
except IndexError:
return self.rollMonths.index[0]
else:
return date.month
else:
try:
return self.rollMonths.index[self.rollMonths.index.get_loc(date.month, method='bfill')]
except KeyError:
return self.rollMonths.index[0]
def previousRollDate(self, date):
w = self.previousRollMonth(date)
if w > date.month:
d = datetime.date(date.year-1, w, self.day2roll)
return self.rollDate_adjBDay(d)
else:
d = datetime.date(date.year, w, self.day2roll)
return self.rollDate_adjBDay(d)
def nextRollDate(self, date):
w = self.nextRollMonth(date)
if w < date.month:
d = datetime.date(date.year+1, w, self.day2roll)
return self.rollDate_adjBDay(d)
else:
d = datetime.date(date.year, w, self.day2roll)
return self.rollDate_adjBDay(d)
def currentContract(self, date):
w = self.previousRollMonth(date)
c = self.rollMonths.loc[w, 'into']
if c < date.month:
return '%04d%02d' % (date.year+1, c)
else:
return '%04d%02d' % (date.year, c)
def previousContract(self, date):
w = self.previousRollMonth(date)
c = self.rollMonths.loc[w, 'from']
if c < date.month:
return '%04d%02d' % (date.year+1, c)
else:
return '%04d%02d' % (date.year, c)
def nextContract(self, date):
w = self.nextRollMonth(date)
c = self.rollMonths.loc[w, 'into']
if c < date.month:
return '%04d%02d' % (date.year+1, c)
else:
return '%04d%02d' % (date.year, c)
def save(self):
filepath = self.rollingTSPath + self.symbol + '_' + self.datasource + '.cc'
self.ts.to_csv(filepath,sep=';', index=True, date_format='%Y-%m-%dT%H:%M:%S')
def maturitiesPerDay(self, start, end):
'''
returns df with business days, giving for each day the current contract
'''
f=lambda x: self.currentContract(x)
r=pd.date_range(pd.to_datetime(start), pd.to_datetime(end))
r=r[r.dayofweek<5]
df=pd.DataFrame(r, index=r, columns=['maturity']).applymap(f)
return df
def getParameters(self):
with open(self.dir_path+'\\futures_parameters.pickle', 'rb') as f:
return pickle.load(f)
'''
import pickle
d ={'stichDayMaxOffset' : 3
,'rollingTSPath' : 'Z:/Sync/StartMe/Data/_Rolling/'
}
with open('Z:/Sync/StartMe/PythonPackages/pygruebi/futures_parameters.pickle', 'wb') as f: pickle.dump(d, f) ''' `
@js190
def stitch_prices(dfs, price_col, dates):
"""Stitches together a list of contract prices. dfs should contain a
list of dataframe objects, price_col is the column name to be
combined, and dates is a list of stitch dates. The dataframes must
be date indexed, and the order of dates must match the order of
the dataframes. The stitching method is called the Panama method -
more details can be found at
http://qoppac.blogspot.de/2015/05/systems-building-futures-rolling.html
"""
res = []
datesr = list(reversed(dates))
dfsr = list(reversed(dfs))
dfsr_pair = shift(dfsr,pd.DataFrame())
for i,v in enumerate(datesr):
tmp1=float(dfsr[i].ix[v,price_col])
tmp2=float(dfsr_pair[i].ix[v,price_col])
dfsr_pair[i].loc[:,price_col] = dfsr_pair[i][price_col] + tmp1-tmp2
dates.insert(0,'1900-01-01')
dates_end = shift(dates,'2200-01-01')
for i,v in enumerate(dates):
tmp = dfs[i][(dfs[i].index > dates[i]) & (dfs[i].index <= dates_end[i])]
res.append(tmp.Settle)
return pd.concat(res)
gist.github.com is a good place to share code like this.
How about tidying them up opening a pull request against the repo.? either in master or a WIP branch
thanks for posting these! super useful!
Ok, tried plugging the code to gist.gihub.com as suggested in case it might be useful for you Rob or anyone else (I am kind of new to github so thanks for bearing with me. A pull request would need more adjustments): https://gist.github.com/gruebi/ebb7c699972216cfa25240e999561fb9.js Comments welcome!
https://gist.github.com/gruebi/ebb7c699972216cfa25240e999561fb9 is the correct link, sorry.
Would it be possible to update the docs/introduction.md with the results from the latest data. (I would run through and update it myself however before my most recent pull I was having trouble getting the same numbers from the get_capped_forecast step onwards, and I would like to try again with the new data). Thanks.
closing this as no longer an issue
@drSeeS I tried to have a look at your gist but the neither link works?
Hi Rob,
Would you be so kind as to update the CSV data? I would like to simulate performance over Brexit.
Additionally, are you able to provide and script for back-adjusting the futures contracts? I have attempted to create one so I can update the data myself. It is based on the traded price for the new future and subtracting the difference between the traded price of the new future from the price of the old future at the time of trade across the historical data series, but this is yielding some strange results for me. I assume you have something similar you must used when rolling contracts and back-adjusting the price series?
Thanks