kernc / backtesting.py

:mag_right: :chart_with_upwards_trend: :snake: :moneybag: Backtest trading strategies in Python.
https://kernc.github.io/backtesting.py/
GNU Affero General Public License v3.0
5.53k stars 1.07k forks source link

Invalid daily aggregation of OHLC data with timezone/datetime offset #1002

Open h0wXD opened 1 year ago

h0wXD commented 1 year ago

Expected Behavior

resample('D') to take in account the right trading day when using timezoneoffset dates (issue with date parsing?)

Actual Behavior

Resample('D') of hourly candle puts equity sample on weekend instead of friday, when position entry was clearly on friday, equity balance should also be on friday instead of saturday.

Steps to Reproduce

Added log lines (see below) and ran sample strategy on AAPL1H timeframe exported from tradingview (only way to get correct candles plotted and entries plotted in my timezone +8, is to also add the timezoneoffset to the export) for both plotting and 100% same entry/exits (can see in trades table - except last position entry is using final candle.close instead of final candle.open in backtesting.py) Comparing with my C# code, where starting equity is at day end of Friday 2022-08-26 - 10077.1, where on backtesting.py it's moved to saturday, leading to incorrect results on lower timeframes. I have compared daily backtest of 'D' in both my program and backtesting.py, results are equal, so I think backtesting.py is not taking datetimeoffset into account for candles with lower interval

    day_returns = np.array(np.nan)
    annual_trading_days = np.nan
    if isinstance(index, pd.DatetimeIndex):
        day_returns = equity_df['Equity'].resample('D').last().dropna().pct_change()
        equity_df['Equity'].to_csv("Equity.csv")
        equity_df['Equity'].resample('D').last().dropna().to_csv("EquityD.csv")
class SmaCross(Strategy):
    n1 = 50
    n2 = 100

    def init(self):
        close = self.data.Close
        self.sma1 = self.I(SMA, close, self.n1)
        self.sma2 = self.I(SMA, close, self.n2)

    def next(self):
        if crossover(self.sma1, self.sma2):
            self.buy()
        elif crossover(self.sma2, self.sma1):
            self.sell()

bt = Backtest(AAPL1H, SmaCross,
              cash=10000, commission=.00,
              exclusive_orders=True,)

Additional info

AAPL1H.csv Equity.csv EquityD.csv image Some C# logic I wrote shows first change in portfolio balance on friday 2022-08-26 image backtesting.py logic shows first change in portfolio balance on saturday 2022-08-27 image

kernc commented 1 year ago

In Equity.csv, the first change occurs:

2022-08-27 01:30:00+08:00,10000.0
2022-08-27 02:30:00+08:00,10028.392  <--
2022-08-27 03:30:00+08:00,10077.1
2022-08-29 21:30:00+08:00,10173.56

In EquityD.csv, this shows as:

2022-08-27 00:00:00+08:00,10077.1

which I guess is reasonable since the two dates match.

Can you use:

df.index = df.index.tz_convert(None)

before passing df to Backtest()?

h0wXD commented 1 year ago

@kernc that works perfectly, thanks for the quick response

after doing the following before passing it to backtest

AAPL1H.index = AAPL1H.index.tz_convert(None)

now the Equity results are correct comparing to my previously shared C# sample

2022-08-23,10000.0
2022-08-24,10000.0
2022-08-25,10000.0
2022-08-26,10077.1
2022-08-29,10214.5
2022-08-30,10362.1
2022-08-31,10466.5
2022-09-01,10418.5
2022-09-02,10545.7
2022-09-06,10624.3
2022-09-07,10541.5
2022-09-08,10628.5

EquityD.csv Equity.csv

Do you reckon this should be built-in to backtesting.py?

kernc commented 1 year ago

Do you reckon this should be built-in to backtesting.py?

I'm not too certain. If the user prefers timestamps in TZ-aware UTC time, I'm thinking why override it? In all respects, the user (should) knows what they are doing. And it's a simple-enough workaround.

h0wXD commented 1 year ago

I still think this should be handled by the library when library users do use unintended datetime formats, as using date time dataset with offset causes invalid backtest results, this is a date handling issue. The datasets used above are default tradingview exports with only the csv headers updated to ,Open,High,Low,Close,Volume,VolMa. When changing the tradingview chart to UTC and exporting dates are in format "2022-08-03T16:30:00Z", when exporting from your local timezone it's in "2022-08-04 00:30:00+08:00". If this is not supported / leads to invalid backtest results, it would be nice to at least show a warning message. Thanks for the quick response and time spent building this amazing library!

kernc commented 1 year ago

using date time dataset with offset causes invalid backtest results

Those results are not invalid! In Greenwich, it was simply already Saturday when the trade closed!

I feel this change would force a behavior which then couldn't be reverted. Maybe we can indeed issue a warning if timezone offset is present somewhere around here: https://github.com/kernc/backtesting.py/blob/0ce24d80b1bcb8120d95d31dc3bb351b1052a27d/backtesting/backtesting.py#L1123-L1126