yhilpisch / py4at

Jupyter Notebooks and code for the book Python for Algorithmic Trading (O'Reilly) by Yves Hilpisch.
http://home.tpq.io/books/py4at
Other
644 stars 325 forks source link

dropna causes incorrect strategy return #1

Closed mtucker502 closed 3 years ago

mtucker502 commented 3 years ago

https://github.com/yhilpisch/py4at/blob/4b773413c27dc9b084f3b4df8d903e8f8545e940/ch04/SMAVectorBacktester.py#L83

This dropna() causes the first row in strategy to be nan which will affect the creturn.

yhilpisch commented 3 years ago

Why exactly should it become incorrect?

The idea ist to use the same data rows to calculate both the benchmark and strategy returns.

After the positions are drived, there is a second .dropna() used to align the data (again).

data.dropna(inplace=True)

mtucker502 commented 3 years ago

Dropping the nan rows to align the returns to the same timeframe assumes that the comparison (the underlying symbol) doesn't have any returns while we wait for enough periods to populate the SMA. But the market does have returns and sometimes these can be very large returns but will be dropped due to nan values in the SMA column.

In my use case, I'm using SMA on 5m bars and the performance of the market starting at 09:30 and the algo starting at say, 10:00 can be substantial.

For example, looking at 2021-02-01 5m candles using close as price

WIth dropna:

data[['return', 'strategy']].sum()

return      0.020638
strategy   -0.001147
dtype: float64

Without dropna:

data[['return', 'strategy']].sum()

return     -0.003827
strategy   -0.001147
dtype: float64

It still applies to daily bars. If we wait for SMA2 to reach 252 days (example from your companion book) we miss the entire first year of market's returns.

yhilpisch commented 3 years ago

I expected a technical argument. But saying, like, "I am not on holiday, therefore I am missing out on fun when I am not there" is not really an argument, isn't it? Saying, like, when I am there "what can I expect/get" is proper I guess.

In your case, you should have an approach connecting previous days to the next trading day to get started right away.

We have implemented this with instruments being traded 24 hours, assuming that we trade only 8, let's say. Pretty straightforward to connect the trading intervals.

mtucker502 commented 3 years ago

For live trading, we can back fill if already past the largest SMA period. Or in the case of an instrument that is traded 24/7 we can always back fill data so are SMA are available immediately.

I noticed this specifically with a day trading algorithm which close all trades at the end of each day. As such, we need to wait for SMA bars to be populated each day after trading begins. With dropping the nan values we skew the results of the backtest (even with daily bars). Am I approaching this incorrectly?

yhilpisch commented 3 years ago

Yes, there is some information lost for sure.

But it is important, from my point of view, to only compare strategy results with benchmark results for the exact same period.

When working with a fixed data set (CSV file), the loss of data/information is then not avoidable -- in particular when you want to work with the complete (remaining) data set.

mtucker502 commented 3 years ago

Agreed. Closing this issue.