Issues with updating class variables while walk-forward optimising

patricna commented 1 year ago

I have issues with updating MLTrainOnceStrategy class variables while walk-forward optimizing like done with self.clf in MLWalkForwardStrategy in the tutorial notebook Trading with Machine Learning Models:

class MLWalkForwardStrategy(MLTrainOnceStrategy):
    def next(self):
        # Skip the cold start period with too few values available
        if len(self.data) < N_TRAIN:
            return

        # Re-train the model only every 20 iterations.
        # Since 20 << N_TRAIN, we don't lose much in terms of
        # "recent training examples", but the speed-up is significant!
        if len(self.data) % 20:
            return super().next()

        # Retrain on last N_TRAIN values
        df = self.data.df[-N_TRAIN:]
        X, y = get_clean_Xy(df)
        self.clf.fit(X, y)

        # Now that the model is fitted, 
        # proceed the same as in MLTrainOnceStrategy
        super().next()

bt = Backtest(data, MLWalkForwardStrategy, commission=.0002, margin=.05)
bt.run()

When I print in the above, I see that the model is training, but the self.clf is still the same as defined in def init(self) in the main strategy class MLTrainOnceStrategy.

This issue suddenly appeared last week. This approach worked before. Is anyone experiencing the same?

A way to see that it doesn't work is to look at the equity curves for both MLTrainOnceStrategy and MLWalkForwardStrategy. They will look exactly the same.

MLTrainOnceStrategy in the notebook:

from backtesting import Backtest, Strategy

N_TRAIN = 400

class MLTrainOnceStrategy(Strategy):
    price_delta = .004  # 0.4%

    def init(self):        
        # Init our model, a kNN classifier
        self.clf = KNeighborsClassifier(7)

        # Train the classifier in advance on the first N_TRAIN examples
        df = self.data.df.iloc[:N_TRAIN]
        X, y = get_clean_Xy(df)
        self.clf.fit(X, y)

        # Plot y for inspection
        self.I(get_y, self.data.df, name='y_true')

        # Prepare empty, all-NaN forecast indicator
        self.forecasts = self.I(lambda: np.repeat(np.nan, len(self.data)), name='forecast')

    def next(self):
        # Skip the training, in-sample data
        if len(self.data) < N_TRAIN:
            return

        # Proceed only with out-of-sample data. Prepare some variables
        high, low, close = self.data.High, self.data.Low, self.data.Close
        current_time = self.data.index[-1]

        # Forecast the next movement
        X = get_X(self.data.df.iloc[-1:])
        forecast = self.clf.predict(X)[0]

        # Update the plotted "forecast" indicator
        self.forecasts[-1] = forecast

        # If our forecast is upwards and we don't already hold a long position
        # place a long order for 20% of available account equity. Vice versa for short.
        # Also set target take-profit and stop-loss prices to be one price_delta
        # away from the current closing price.
        upper, lower = close[-1] * (1 + np.r_[1, -1]*self.price_delta)

        if forecast == 1 and not self.position.is_long:
            self.buy(size=.2, tp=upper, sl=lower)
        elif forecast == -1 and not self.position.is_short:
            self.sell(size=.2, tp=lower, sl=upper)

        # Additionally, set aggressive stop-loss on trades that have been open 
        # for more than two days
        for trade in self.trades:
            if current_time - trade.entry_time > pd.Timedelta('2 days'):
                if trade.is_long:
                    trade.sl = max(trade.sl, low)
                else:
                    trade.sl = min(trade.sl, high)

patricna commented 1 year ago

Simplified example:

Simple SMA strategy. Test by increasing the moving averages' lengths with 10 per 500 bars (just to check if indicators change).

Initial model:

from backtesting import Backtest, Strategy
from backtesting.lib import crossover

from backtesting.test import SMA, GOOG

class SmaCross(Strategy):
    n1 = 10
    n2 = 20

    size = 1

    def init(self):
        close = self.data.Close
        self.sma1 = self.I(SMA, close, self.n1)
        self.sma2 = self.I(SMA, close, self.n2)

    def next(self):
        if crossover(self.sma1, self.sma2):
            self.buy(size=self.size)
        elif crossover(self.sma2, self.sma1):
            self.sell(size=self.size)

bt = Backtest(GOOG, SmaCross,
              cash=10000, commission=.002,
              exclusive_orders=True)

output = bt.run()

Walk-forward model:

class WalkForwardSmaCross(SmaCross):
    def next(self):
        if len(self.data) % 500:
            return super().next()
        # Increase moving avg length with 10
        self.n1 += 10
        self.n2 += 10

        super().next()

bt_wf = Backtest(GOOG, WalkForwardSmaCross,
              cash=10000, commission=.002,
              exclusive_orders=True)

output_wf = bt_wf.run()

Then compare equity curves:

ax = output["_equity_curve"].Equity.plot(label="SmaCross", figsize=(10,5), alpha=0.5, linestyle=":", lw=2)
output_wf["_equity_curve"].Equity.plot(label="WalkForwardSmaCross", alpha=0.5, lw=2)
ax.set_title("Equity curves")
ax.legend()
plt.show()

Output:

As you can see, the equity curves are identical. Seems to be an issue with this being Indicators, because if I do the same to the self.size instead of self.n1 or self.n2, it works:

class WalkForwardSmaCross(SmaCross):
    def next(self):
        if len(self.data) % 500:
            return super().next()
        # Increase position size with 1
        self.size += 1
        super().next()

bt_wf = Backtest(GOOG, WalkForwardSmaCross,
              cash=10000, commission=.002,
              exclusive_orders=True)

output_wf = bt_wf.run()

Plot output:

kernc commented 1 year ago

In your simplified example, changing n1 and n2 in next() does not (and is not supposed to) affect your SMA indicators precomputed in init() ...

patricna commented 1 year ago

Ok thank you, is there a way to update indicators thought-out the backtest? Or do you have any suggestions on how to mimic this?

kernc commented 1 year ago

is there a way to update indicators thought-out the backtest?

Working with the simplified example, SMAs simply need to be recomputed after n1 change, i.e.:

        self.n1 += 10
        self.sma1 = self.I(SMA, close, self.n1)
        ...

but the self.clf is still the same as defined in def init(self)

Fitting doesn't change the model object (after fitting, it still holds: self.clf == self.clf and self.clf is self.clf), but it should change model's internal parameters. Can you confirm?

A way to see that it doesn't work is to look at the equity curves for both MLTrainOnceStrategy and MLWalkForwardStrategy. They will look exactly the same.

This is certainly not evidence enough that the reiterative fitting doesn't work. Your code looks ok at a glance.

kernc / backtesting.py

Issues with updating class variables while walk-forward optimising #1007

Simplified example: