robcarver17 / pysystemtrade

Systematic Trading in python
GNU General Public License v3.0
2.66k stars 835 forks source link

Data Update plus script for backadjusting futures #25

Closed ghost closed 8 years ago

ghost commented 8 years ago

Hi Rob,

Would you be so kind as to update the CSV data? I would like to simulate performance over Brexit.

Additionally, are you able to provide and script for back-adjusting the futures contracts? I have attempted to create one so I can update the data myself. It is based on the traded price for the new future and subtracting the difference between the traded price of the new future from the price of the old future at the time of trade across the historical data series, but this is yielding some strange results for me. I assume you have something similar you must used when rolling contracts and back-adjusting the price series?

Thanks

robcarver17 commented 8 years ago

Data update is done.

The back adjustment script is on my to do list already (at #2 if you're interested). It's not a quick thing to write unfortunately so it could be a while before I get round to it.

ghost commented 8 years ago

Great, thanks for the data update!

I'll patiently look out for the back-adjustment script.

Thanks

js190 commented 8 years ago

+1 for back-adjustment. Does anyone have a panama adjustment script (no matter how rough) to get me started? Would be awesome have something to use as a base for coding up a back-adjustment. Ideally i'd like to be able to create the provided data sets from script.

ChrisAllisonMalta commented 8 years ago

This is what I'm using, I store my prices in my SQL Database.

A couple of proviso's: 1) Im still learning python so this may be pretty poor quality!!! 2) Because I store my prices in SQL, I have a table (RollDate) which has a list of contract, roll dates, switch from contract and switch to contract. I get these dates from Rob's csv files. The long SQL query basically gives me a table of data with roll dates and the price adjustment factor. 3) At the moment I'm still building up my stitched price series, so I'm going one by one incase I find a bug, the last one I worked on was S (SoyBeans)

`import pyodbc import pandas from datetime import datetime import matplotlib.pyplot as plt import numpy

from GlobalVariables import MyGlobalVariables

def CreateStitchdPriceStream():

plt.interactive(False)
print("Importing Libraries")
cnxn = pyodbc.connect(MyGlobalVariables.ConnStr)

sql = "select Symbol from FuturesSecurityMaster where Trading = 1 and Symbol = 'S'"
TradingFutures = pandas.read_sql(sql, cnxn)
print("Getting Live Symbols")
for index, row in TradingFutures.iterrows():
    sql = "select symbol, Date, price from StitchedFuturesPrices where symbol = ?"
    print(row['Symbol'])
    symbol = row['Symbol']
    StitchedPrices = pandas.read_sql(sql, cnxn, params=[row['Symbol']])
    print("There are %s stitched prices" % len(StitchedPrices))

    sql = """select Symbol,
              case when StitchingDate >= MaxStoredDate then StitchingDate else  MaxStoredDate end as StitchingDate, SwitchToContract, SwitchFromContract, Differential, NextStitchingDate, LastRow
              from
              (select Symbol, StitchingDate,
              IsNull((select max(Date) from StitchedFuturesPrices SFP where SFP.Symbol = RD.Symbol),DATEFROMPARTS(1800,1,1)) as MaxStoredDate,
              SwitchToContract, SwitchFromContract, FPTo._Close - FPFrom._Close as Differential,
              IsNull(LEAD(StitchingDate) OVER (ORDER BY StitchingDate),
             (select max(date) from FuturesPrices FPM where FPM.ContractCode = SwitchToContract )) NextStitchingDate,
             case ISNull(LEAD(FPTo._Close) OVER (ORDER BY StitchingDate),1) when 1 then 1
             else 0 end as LastRow
             from RollDate RD
             Inner Join FuturesPrices FPTo on FPTo.Date = RD.StitchingDate and FPTo.ContractCode = RD.SwitchToContract
             Inner Join FuturesPrices FPFrom on FPFrom.Date = RD.StitchingDate and FPFrom.ContractCode = RD.SwitchFromContract
             where symbol = ?) as source1 where (NextStitchingDate >
             IsNull((select max(Date) from StitchedFuturesPrices SFP where SFP.Symbol = source1.Symbol),DATEFROMPARTS(1800,1,1)))
          """

    StitchDates = pandas.read_sql(sql, cnxn, params=[symbol])
    print(StitchDates)
    StitchDates.sort_values(by='StitchingDate', ascending=True, inplace=True)
    row_iterator = StitchDates.iterrows()
    for i, StitchDateRow in row_iterator:
        StitchDate = StitchDateRow['StitchingDate']
        previouscontract = StitchDateRow['SwitchFromContract']
        nextdate = StitchDateRow['NextStitchingDate']
        CurrentContract = StitchDateRow['SwitchToContract']
        differential = StitchDateRow['Differential']
        islastrow = StitchDateRow['LastRow']
        if i == 0 and len(StitchedPrices) == 0:
            sql = """select Date, _Close as price from FuturesPrices
                     where ContractCode = ? and DATE < ?
                     order by Date """
            StitchedPrices = pandas.read_sql(sql, cnxn, params=[previouscontract, StitchDate])
            print("%s stitched prices now" % len(StitchedPrices))
            sql = """select Date, _Close as price from FuturesPrices
                     where ContractCode = ? and DATE >= ? and Date < ?
                     order by Date """
            PricesToAdd = pandas.read_sql(sql, cnxn, params=[CurrentContract, StitchDate, nextdate])
            print("%s prices to add" % len(PricesToAdd))
            print("Adjusting stitched prices by %s" % differential)
            StitchedPrices['price'] = StitchedPrices['price'] + differential
            StitchedPrices = pandas.concat([StitchedPrices, PricesToAdd])
            print("%s stitched prices now" % len(StitchedPrices))
        else:
            # if its the end of the data I need to include the latest day
            if(islastrow == 1):
                sql = """select Date, _Close as price from FuturesPrices
                         where ContractCode = ? and Date >= ? and Date <= ?
                         order by Date """
            else:
                sql = """select Date, _Close as price from FuturesPrices
                        where ContractCode = ? and DATE >= ? and Date < ?
                        order by Date """
            if(islastrow != 1 and len(StitchDates) != 1):
                print("Adding %s to previous prices" % (StitchDateRow['Differential']))
                StitchedPrices['price'] = StitchedPrices['price'] + differential
            print("Getting prices for %s between %s and %s" % (CurrentContract, StitchDate, nextdate))
            PricesToAdd = pandas.read_sql(sql, cnxn, params=[CurrentContract, StitchDate, nextdate])
            print("%s prices to add" % len(PricesToAdd))
            if (islastrow == 1 and len(StitchDates) == 1):
                print("Removing stitched prices as we're just adding a new prices, no need to update the differential")
                StitchedPrices = PricesToAdd
                # Drop first row
                StitchedPrices.drop(StitchedPrices.index[:1], inplace=True)
            else:
                StitchedPrices = pandas.concat([StitchedPrices, PricesToAdd])
            print("%s stitched prices now" % len(StitchedPrices))
    print(StitchedPrices.head())
    print(StitchedPrices.tail())
    print("Plotting")
    StitchedPrices['Date'] = pandas.to_datetime(StitchedPrices['Date'])
    plt.plot(numpy.arange(len(StitchedPrices)), StitchedPrices.price)
    plt.show()
    print("Storing stitched prices ...")
    StoreStitchedPrices(StitchedPrices, symbol)

def StoreStitchedPrices(StitchedPrices, Symbol): cnxn = pyodbc.connect(MyGlobalVariables.ConnStr) deletecursor = cnxn.cursor() deletecursor.execute("delete from StitchedFuturesPrices where Symbol = ?", Symbol) print('Deleted {} old records products'.format(deletecursor.rowcount)) cnxn.commit()

cursor = cnxn.cursor()
pricesdownloaded = 0
for index, subrow in StitchedPrices.iterrows():
    cursor.execute("""insert into StitchedFuturesPrices(Symbol, Date, Price) values (?, ?, ?)""",
                   Symbol, subrow['Date'], subrow['price'])
    pricesdownloaded += 1
    DataToStore = True
if DataToStore:
    print("Committing %s prices!" % pricesdownloaded)
    cnxn.commit()

`

drSeeS commented 8 years ago

Don't know if this is a good spot to share code. But anyway i profit a lot from sharing. So pick from this here as well if it is useful to you.

This is a more general, object oriented approach.

Have fun. It is quite new, so I am thankful if you let me know any bugs you come across.

` import datetime import pandas as pd from pandas.tseries.offsets import Day, BDay, DateOffset import numpy as np import os import pickle

class rollingFuturesContract:

def __init__(self,  maturityMonths, rollMonthsBeforeMat, day2roll, adjBDay=1):
    '''
    Class for building rolling futures contracts

    Class obbject carries all instructions for constructing a rolling contract. Actual construction for a specific symbol is done in get method.

    Rolling always takes place on early morning of the roll date. 

    : maturityMonths : list of futures maturing moths -> [3,6,9,12]
    : rollMonthsBeforeMat : how many months before a contract matures the rolling should take place -> 2
    : day2roll : on which calender day of a month we should roll -> 15
    : adjBDay : if day2roll is on weekend, move it to next Monday (adjBDay>0 or adjBDay=True), or to previous Friday (adjBDay<0)

    : returns : new rollingFuturesContract object
    '''
    self.path = os.path.abspath(__file__)
    self.dir_path = os.path.dirname(self.path)
    self.params = self.getParameters()
    self.stichDayMaxOffset = self.params['stichDayMaxOffset'] # max number of days before the roll date, on which presence of price data will be checked for stitching contracts together
    self.rollingTSPath = self.params['rollingTSPath']

    self.day2roll = day2roll
    self.adjBDay = adjBDay 
    self.maturityMonths = maturityMonths

    self.maturityMonths.sort()
    self.rollMonthsList = [self.monthDiff(y, -rollMonthsBeforeMat) for y in self.maturityMonths] # list with months in which rolling takes place 
    self.maturityMonthsReordered = self.maturityMonths[1:] + self.maturityMonths[:1]
    self.rollMonths = pd.DataFrame({'from' : self.maturityMonths, 'into' : self.maturityMonthsReordered}, index=self.rollMonthsList) # df with details of rolling cycle (general info, independent of actual start and end)
    self.rollMonths.index.name='month'

    self.rollMatrix =[] # df with actual roll details (roll dates, stitch dates, prices, price adjustment etc.)
    self.symbol = None
    self.datasource = None
    self.start = None
    self.end = None
    self.method = None
    self.ts = []

def get(self, symbol, datasource, start, end, method='panama_bk', **kwargs):
    '''
    Construct actual rolling futures contract for specific symbol

    : symbol : instrument symbol -> 'T'
    : datasource : source of price data. Currently only 'quandl'. Additional source dependent parameters can be put in **kwargs
    : start : start date of time series -> '2000-01-01'
    : end : end date of time series -> '2016-12-31'
    : method : how rolling contract should be constructed. Currently only 'panama_bk'
    : **kwargs : additional parameters required by datasource or method applied (f.e. database name if using datasource='quandl)
    '''

    self.symbol = symbol
    self.datasource = datasource
    self.start = start
    self.end = end
    self.method = method
    self.kwargs = kwargs

    if method == 'panama_bk':
        self.construct_panamabk(symbol, datasource, start, end, **kwargs)
    else:
        print('could not understand method for constructing the rolling contract: ' + method + '. At the moment you can choose from \'panama_bk\'')
        return
    return self.ts.copy()

def construct_panamabk(self, symbol, datasource, start, end, **kwargs):
    self.datasource = datasource
    rm = self.doRollMatrix(start, end)
    out=[]
    self.rollMatrix = pd.DataFrame(index=np.arange(0, len(rm)), columns=('from', 'to', 'curContract', 'prevContract', 'nextContract', 'stitchDate', 'unadjCloseCurC', 'adjCloseNextC', 'priceAdj', 'priceAdjCum') )

    # treat one contract after the other, coming from the latest to process
    p_diff_cum = 0
    i=len(rm)-1

    while i >= 0:
        getDateFrom = rm.ix[i,'from'] - self.stichDayMaxOffset * BDay() # load a little more data so we are sure we can do the stiching 
        getDateTo = rm.ix[i,'to'] + self.stichDayMaxOffset * BDay()

        print('Processing maturity ' + rm.ix[i,'curContract']+': getting raw prices from '+getDateFrom.strftime('%Y-%m-%d')+' to '+getDateTo.strftime('%Y-%m-%d'))
        cur = self.getRawPrices(symbol, rm.ix[i,'curContract'], datasource, getDateFrom, getDateTo, **kwargs)

        # sticht the contracts together (do stitching only from the second round on)
        if i<len(rm)-1: 
            stitchDate = rm.ix[i+1, 'from'] - DateOffset(days=1) # contract switches in the early morning, price offset is determined from close before rollDate

            # if there was not data for the day before the roll date, try the previous stichDayMaxOffset consecutive days (maybe there was a weekend?)
            found=False
            earliestAllowedStitchDate = stitchDate - self.stichDayMaxOffset*Day()

            while stitchDate >= earliestAllowedStitchDate:
                try:
                    p_cur = cur.ix[stitchDate.strftime('%Y-%m-%d'), 'Close']
                    p_next = next.ix[stitchDate.strftime('%Y-%m-%d'), 'Close']
                    found=True
                    #print('Found a stitch date: '+stitchDate.strftime('%Y-%m-%d'))
                    break
                except KeyError:
                    pass
                stitchDate=stitchDate - Day()  

            if found is False:
                raise ValueError('While trying to stich futures contracts: Could not get any overlapping data within allowed perios. Tried backwards until ' + str(self.stichDayMaxOffset) + ' days before roll date')

            p_diff = p_next - p_cur # if positive number -> previous series needs to be elevated
            p_diff_cum = p_diff_cum + p_diff
            #print('Stitch info: stitch date: ' + stitchDate.strftime('%Y-%m-%d') + ', p_cur: ' + str(p_cur) + ', p_next: ' + str(p_next) + ', p_diff: '+ str(p_diff) + ', p_diff_cum: '+ str(p_diff_cum))
            cur['Open'] = cur['Open'].apply(lambda x: x + p_diff)
            cur['High'] = cur['High'].apply(lambda x: x + p_diff)
            cur['Low'] = cur['Low'].apply(lambda x: x + p_diff)
            cur['Close'] = cur['Close'].apply(lambda x: x + p_diff)

            self.rollMatrix.loc[i] = [rm.ix[i,'from'].date(), rm.ix[i,'to'].date(), rm.ix[i,'curContract'], rm.ix[i,'prevContract'], rm.ix[i,'nextContract'], stitchDate.strftime('%Y-%m-%d'), p_cur, p_next, p_diff, p_diff_cum]                
        else:
            # in first round set only 'None' to detail matrix
            self.rollMatrix.iloc[i] = [rm.ix[i,'from'].date(), rm.ix[i,'to'].date(), rm.ix[i,'curContract'], rm.ix[i,'prevContract'], rm.ix[i,'nextContract'], None, None, None, 0, 0] 

        cur['curContract'] = rm.ix[i,'curContract']
        cur['priceAdj'] = p_diff_cum
        out.append(cur[rm.ix[i,'from'].strftime('%Y-%m-%d'):rm.ix[i,'to'].strftime('%Y-%m-%d')])

        next=cur        
        i=i-1

    out=list(reversed(out))
    self.ts = pd.concat(out)

def doRollMatrix(self, start, end): # create df with roll periods (from, to, curContract, prevContract...)
    '''
    constructs a df with roll info (current previous and contract, from, to)
    '''
    rd = pd.to_datetime(start)
    l=[rd]
    e=pd.to_datetime(end)
    while l[-1]<e:
        l.append(self.nextRollDate(pd.to_datetime(l[-1])))
    del l[-1]    
    df=pd.DataFrame({'from': l})

    #add other columns
    df['to']=df['from'].map(lambda x : self.nextRollDate(x) - Day()) # take previous day, not prevBusDay, since some contracts might be trading on the weekend
    df['curContract']=df['from'].map(lambda x : self.currentContract(x))
    df['prevContract']=df['from'].map(lambda x : self.previousContract(x))
    df['nextContract']=df['from'].map(lambda x : self.nextContract(x))

    #restrict end of last period to end date (set 'to' to end date)
    if df.ix[len(df)-1,'to']>e: df.ix[len(df)-1,'to'] = e
    return df        

def getRawPrices(self, symbol, maturity, datasource, start, end, **kwargs):
    if datasource=='quandl':
        from pygruebi.datastore import dsQuandl
        q=dsQuandl.dsQuandl(database=kwargs['database'], symbol=symbol, maturity=maturity)
        df=q.get(start.strftime('%Y-%m-%d'), end.strftime('%Y-%m-%d'), **kwargs)
        return df
    else:
        print('could not understand the datasource wanted for constructing the rolling contract: ' + datasource + '. At the moment you can choose from: \'quandl\'')    

def monthDiff(self, month, diff): #calculates an offset of diff months to a given month number
    o=month+diff
    while o<=0: o+=12
    return o   

def rollDate_adjBDay(self, date):
    '''
    moves potential roll day of the particular year and month (given in 'date') out of a weekend
    '''
    d = pd.datetime(date.year, date.month, self.day2roll)
    if self.adjBDay>0:
        d = d - Day() + BDay() #move weekend to Monday
    elif self.adjBDay<0:
        d = d + Day() - BDay() # move weekend to Friday
    return d   

def previousRollMonth(self, date):
    if date.month in self.rollMonths.index:
        if date<self.rollDate_adjBDay(date): 
            return self.rollMonths.index[self.rollMonths.index.get_loc(date.month)-1]
        else:
            return date.month
    else:   
        #print('date.month: '+str(date.month))
        if date.month>=self.rollMonths.index[0]:
            return self.rollMonths.index[self.rollMonths.index.get_loc(date.month, method='pad')]
        else:
            return self.rollMonths.index[-1]

def nextRollMonth(self, date):
    if date.month in self.rollMonths.index:
        if date>=self.rollDate_adjBDay(date): 
            try:
                return self.rollMonths.index[self.rollMonths.index.get_loc(date.month)+1]
            except IndexError:
                return self.rollMonths.index[0]
        else:
            return date.month
    else:
        try:
            return self.rollMonths.index[self.rollMonths.index.get_loc(date.month, method='bfill')]
        except KeyError:
            return self.rollMonths.index[0]

def previousRollDate(self, date):
    w = self.previousRollMonth(date)
    if w > date.month:
        d = datetime.date(date.year-1, w, self.day2roll)
        return self.rollDate_adjBDay(d)
    else:
        d = datetime.date(date.year, w, self.day2roll)
        return self.rollDate_adjBDay(d)

def nextRollDate(self, date):
    w = self.nextRollMonth(date)
    if w < date.month:
        d = datetime.date(date.year+1, w, self.day2roll)
        return self.rollDate_adjBDay(d)
    else:
        d = datetime.date(date.year, w, self.day2roll)
        return self.rollDate_adjBDay(d)

def currentContract(self, date):
    w = self.previousRollMonth(date)
    c = self.rollMonths.loc[w, 'into']
    if c < date.month:
        return '%04d%02d' % (date.year+1, c)
    else:
        return '%04d%02d' % (date.year, c)

def previousContract(self, date):
    w = self.previousRollMonth(date)
    c = self.rollMonths.loc[w, 'from']
    if c < date.month:
        return '%04d%02d' % (date.year+1, c)
    else:
        return '%04d%02d' % (date.year, c)          

def nextContract(self, date):
    w = self.nextRollMonth(date)
    c = self.rollMonths.loc[w, 'into']
    if c < date.month:
        return '%04d%02d' % (date.year+1, c)
    else:
        return '%04d%02d' % (date.year, c)      

def save(self):
    filepath = self.rollingTSPath + self.symbol + '_' + self.datasource + '.cc'
    self.ts.to_csv(filepath,sep=';', index=True, date_format='%Y-%m-%dT%H:%M:%S')    

def maturitiesPerDay(self, start, end):
    '''
    returns df with business days, giving for each day the current contract
    '''
    f=lambda x: self.currentContract(x)
    r=pd.date_range(pd.to_datetime(start), pd.to_datetime(end))
    r=r[r.dayofweek<5]
    df=pd.DataFrame(r, index=r, columns=['maturity']).applymap(f)
    return df

def getParameters(self):
    with open(self.dir_path+'\\futures_parameters.pickle', 'rb') as  f:
        return pickle.load(f)        

''' import pickle
d ={'stichDayMaxOffset' : 3 ,'rollingTSPath' : 'Z:/Sync/StartMe/Data/_Rolling/' }

with open('Z:/Sync/StartMe/PythonPackages/pygruebi/futures_parameters.pickle', 'wb') as f: pickle.dump(d, f) ''' `

burakbayramli commented 8 years ago

@js190

def stitch_prices(dfs, price_col, dates):
    """Stitches together a list of contract prices. dfs should contain a
    list of dataframe objects, price_col is the column name to be
    combined, and dates is a list of stitch dates. The dataframes must
    be date indexed, and the order of dates must match the order of
    the dataframes. The stitching method is called the Panama method -
    more details can be found at
    http://qoppac.blogspot.de/2015/05/systems-building-futures-rolling.html
    """

    res = []
    datesr = list(reversed(dates))
    dfsr = list(reversed(dfs))    
    dfsr_pair = shift(dfsr,pd.DataFrame())

    for i,v in enumerate(datesr):
        tmp1=float(dfsr[i].ix[v,price_col])
        tmp2=float(dfsr_pair[i].ix[v,price_col])
        dfsr_pair[i].loc[:,price_col] = dfsr_pair[i][price_col] + tmp1-tmp2

    dates.insert(0,'1900-01-01')
    dates_end = shift(dates,'2200-01-01')

    for i,v in enumerate(dates):
        tmp = dfs[i][(dfs[i].index > dates[i]) & (dfs[i].index <= dates_end[i])]
        res.append(tmp.Settle)
    return pd.concat(res)
ehiggs commented 8 years ago

gist.github.com is a good place to share code like this.

cmorgan commented 8 years ago

How about tidying them up opening a pull request against the repo.? either in master or a WIP branch

js190 commented 8 years ago

thanks for posting these! super useful!

drSeeS commented 8 years ago

Ok, tried plugging the code to gist.gihub.com as suggested in case it might be useful for you Rob or anyone else (I am kind of new to github so thanks for bearing with me. A pull request would need more adjustments): https://gist.github.com/gruebi/ebb7c699972216cfa25240e999561fb9.js Comments welcome!

drSeeS commented 8 years ago

https://gist.github.com/gruebi/ebb7c699972216cfa25240e999561fb9 is the correct link, sorry.

JMW100 commented 8 years ago

Would it be possible to update the docs/introduction.md with the results from the latest data. (I would run through and update it myself however before my most recent pull I was having trouble getting the same numbers from the get_capped_forecast step onwards, and I would like to try again with the new data). Thanks.

robcarver17 commented 8 years ago

closing this as no longer an issue

ChrisAllisonMalta commented 8 years ago

@drSeeS I tried to have a look at your gist but the neither link works?

drSeeS commented 8 years ago

@ChrisAllisonMalta should still be working (even including correction of a small bug): gist. Have fun.