kernc / backtesting.py

:mag_right: :chart_with_upwards_trend: :snake: :moneybag: Backtest trading strategies in Python.
https://kernc.github.io/backtesting.py/
GNU Affero General Public License v3.0
5.49k stars 1.07k forks source link

The right way to calculate returns and volatilities: normal or lognormal? #205

Closed sbushmanov closed 3 years ago

sbushmanov commented 3 years ago

Background

There is general agreement stock prices are lognormal, there is less agreement if returns are lognormal or normal. Some people say short term returns are normal and

these distributional conventions are at best approximations. They can be convenient models, but we shouldn't confuse that with the actual distribution of stock prices or returns.

Others say returns are lognormal, and others say returns are normal in shorter terms but lognormal in longer terms (one possible evidence: they cannot go negative in the longer run).

With this is mind there are 2 possible opportunities to calculate returns and volatilities:

  1. directly calculate returns and volatilities from observed returns assuming they are normal. This assumption is important because it will influence the way returns and volatilities accumulate over time.
  2. if we believe returns are lognormal, we can logtransform them to normal, and then infer expected mean and variance from the underlying normal.

Let's see what we have here:

from backtesting import Strategy, Backtest
from backtesting.lib import crossover
from backtesting.test import SMA, GOOG

class SmaCross(Strategy):

    up_fast = 10
    up_slow = 20

    def init(self):

        price = self.data.Close # to be fed to indicator
        self.up_fast = self.I(SMA, price, self.up_fast) # callable, callable args
        self.up_slow = self.I(SMA, price, self.up_slow)

    def next(self):

        if crossover(self.up_fast, self.up_slow):
            self.buy()

        if crossover(self.up_slow, self.up_fast):
            self.sell()

bt = Backtest(GOOG, SmaCross, cash=10000,commission=.002, exclusive_orders=True) 
stats = bt.optimize(up_fast=range(5, 30, 5),
                    up_slow=range(10, 70, 5),
                    maximize='Equity Final [$]',
                    constraint=lambda param: param.up_fast < param.up_slow)
print(stats)

Output:

Start                     2004-08-19 00:00:00
End                       2013-03-01 00:00:00
Duration                   3116 days 00:00:00
Exposure Time [%]                     99.0689
Equity Final [$]                       103949
Equity Peak [$]                        108328
Return [%]                            939.494
Buy & Hold Return [%]                 703.458
Return (Ann.) [%]                     31.6109
Volatility (Ann.) [%]                 44.7398
Sharpe Ratio                          0.70655
Sortino Ratio                         1.49096
Calmar Ratio                         0.718505
Max. Drawdown [%]                    -43.9954
Avg. Drawdown [%]                    -6.13885
Max. Drawdown Duration      690 days 00:00:00
Avg. Drawdown Duration       43 days 00:00:00
# Trades                                  153
Win Rate [%]                           51.634
Best Trade [%]                        61.5629
Worst Trade [%]                      -19.7783
Avg. Trade [%]                        1.55028
Max. Trade Duration          83 days 00:00:00
Avg. Trade Duration          21 days 00:00:00
Profit Factor                         1.98458
Expectancy [%]                        1.97988
SQN                                   1.60416
_strategy                 SmaCross(up_fast...
_equity_curve                             ...
_trades                        Size  Entry...
dtype: object

Annualized returns:

# Return (Ann.) [%]
eq = stats._equity_curve["Equity"]
ret = eq[-1]/eq[0]
dur = stats["Duration"].days
dur = bt._data.shape[0]
ret_ann = ret**(252/dur) -1
print(ret_ann*100, stats["Return (Ann.) [%]"])
#31.610935989553134 31.61093598955005

Despite the fact you seem to rely on log returns we both agree on the result.

Now to volatility:

# Volatility (Ann.) [%] 
s = stats._equity_curve.Equity.pct_change().dropna(0).std()*np.sqrt(252)
print(s*100, stats["Volatility (Ann.) [%]"])
#33.10864212390824 44.73981572651169

I rely on observed data, you seem to model logreturns, yours is ~50% higher.

One more exercise: let's try to model normal logtransformed returns and then switch back to lognormal original, as suggested here or here:

def lognstat(mu, sigma):
    """Calculate the mean of and variance of the lognormal distribution given
    the mean (`mu`) and standard deviation (`sigma`), of the associated normal 
    distribution."""
    m = np.exp(mu + sigma**2 / 2.0)
    v = np.exp(2 * mu + sigma**2) * (np.exp(sigma**2) - 1)
    return m, v

day_returns = stats._equity_curve.Equity.pct_change().fillna(0)
log_day_returns = np.log(day_returns+1) # this assumed normal
mu = log_day_returns.mean()*252
sigma = log_day_returns.std()*np.sqrt(252)
m,v = lognstat(mu,sigma)
(m-1)*100, (np.sqrt(v))*100
#(38.919616652960244, 46.935262560485356)

Interpretation: if we believe returns are lognormal, estimate params of normal log transformed distribution, and then switch back to original returns they [original] should have mean of 38.9% and standard deviation of 46.9%. 38.9% mean is out of question because it contradicts observed annual return, 46.9 close to what you suggest for std, but not what I observe in the market.

Question:

we both seem to agree that annualized expected returns should be calculated from observable realized returns. For annualized volatility, should we use observed volatility in realized returns or infer from log transformed?

kernc commented 3 years ago

Curious to know why you changed your mind? The expression:

ret_ = stats._equity_curve["Equity"].resample("D").last().dropna().pct_change()

is recent. It's reasonable it might be incorrect. dropna() is there to drop weekends and holidays where the input OHLC was undefined.

sbushmanov commented 3 years ago

Curious to know why you changed your mind? The expression:

ret_ = stats._equity_curve["Equity"].resample("D").last().dropna().pct_change()

is recent. It's reasonable it might be incorrect. dropna() is there to drop weekends and holidays where the input OHLC was undefined.

def geometric_mean(returns):
    returns = returns.fillna(0) + 1
    return (0 if np.any(returns <= 0) else
            np.exp(np.log(returns).sum() / (len(returns) or np.nan)) - 1)

day_returns = stats._equity_curve["Equity"].resample('D').last().dropna(0).pct_change()

(1+geometric_mean(day_returns))**252
#1.3161093598955005

the above is just fine and results are in line with observed data. There will be troubles if the amount of data significantly deviates from 252, but I believe it can be overcome with (1+geometric_mean(day_returns))**(len(day_returns), so that even if you have 3 data points you compound them properly.

kernc commented 3 years ago

Now to volatility: I rely on observed data, you seem to model logreturns, yours is ~50% higher.

I followed this paper. It contains derivations and proofs for compounded returns, if you'd care to check yourself.

https://github.com/kernc/backtesting.py/blob/bbcb7bac8ffe65a5002edfcdf8bc50f92eaca2b9/backtesting/backtesting.py#L1566-L1574

sbushmanov commented 3 years ago

Let me see the paper

sbushmanov commented 3 years ago

What he is doing is not new and saying that compounded returns have higher risks is also true. That higher positive returns are associated with higher risks also agrees with common logic (I am not so sure about lower risks for negative returns). There is one major caveat though, where his math falls apart: going from monthly returns to annual. He is saying:

However, the type of returns that performance measurement usually deals with do not sum over time. In fact, the annual portfolio return is made up by compounding the monthly portfolio returns.

And then adding in the footnotes next page:

Note, however, that in the field of investment performance it is uncommon to annualize monthly returns

And I would add here the moment you try to annualize observed monthly returns by compounding you'll see the results are wrong (over- or under-stated against what you observe, due to compounding). You try to remedy this problem by inserting log returns, but it's not part of his paper, he is using observed monthly returns, and he is silent why his compounded annual returns are different from "simple", observable returns. His bootstrap experiment also not very useful. He might have showed that "correct" method is "better" in predicting risks, like Kaplan did before with generating risk profiles for actual stocks, but (1) this wouldn't answer which is better (compunded risk will always be higher for positive returns), let alone what's defined as "better" (2) Kaplan did this for 2008-2009 and then it was in high demand.

Back to volatility. There are 4 choices to be made:

which can boil down to either we "believe" in lognormal observed returns or normal logtransfomed returns. Of course you can substitute beliefs with statistical tests, but imo it would be more practical to leave this choice to the user. If you believe in normally distributed observable returns that are addable on intra-year basis, then multiplying by √T is just fine, if you want to become fancy [in preparation for the next crisis], you can model risk in compounded log returns. And if you choose the latter I believe more theoretically correct way is to do like I did above (model normal log-transformed and then switch back to original lognormal), which will produce correct return expectancy (like you tried to do) and theoretically correct risk estimate for lognormal.

BTW, CFA Institute is cautious on this way of risk calculations:

The author suggests that it may be more appropriate to measure the volatility of annual logarithmic return rather than level returns because annual logarithmic return is the sum of its monthly constituents, thus making multiplication by the square root of 12 appropriate. In my view, it is important for asset managers to encourage the use of mathematically sound procedures for calculating the annualized volatility measure rather than to opt for an expedient but mathematically invalid procedure.

by saying it must be taken into consideration.

kernc commented 3 years ago

I admit it's a bit out of scope for me to decide on or even debate the appropriateness of either measures. The returns are compounded based on the assumption that the trading proceeds are always reinvested, which is the case at least with default size= parameter. Other than that, with the article formula for σa (16) derived in Appendix B, it's not at all clear to me why it'd be invalid. :grimacing:

sbushmanov commented 3 years ago

it's not at all clear to me why it'd be invalid.

The formulae are valid indeed. It's about how to use them

If one has an iid process following 𝑁(μₘ,σₘ), then to calculate monthly compounded μₐ and σₐ one would use the formulae (18) and (19). μₘ and σₘ are monthly characteristics of the observable process, not log returns. Perhaps it's possible to use investment returns instead of using observed values (why they should be different anyway), but they should be equally spaced, and log returns have different probability distribution.

kernc commented 3 years ago

So what do you recommend? And could you wrap and deliver it by way of a PR? :smiley:

sbushmanov commented 3 years ago

what do you recommend?

I realized we need to fix what you want to calculate (and show in the report): realized Return/Volatility (Ann.) or expected. If you want to show annualized realized characteristics (and I do believe this is what you have in mind) you can just do (equity[-1]/equity[0])**(252/len(equity)) for return and equity.pct_change().std()*√252 for realized standard deviation. Note, you do not need log transformation to infer characteristics of the observable returns, and probability distributions are also irrelevant here: both are "approximately" normal (90%+) and log transformation adds only 1% to normality:

image

image

If you want to make projections that's a different topic. The paper you borrowed formula from -- though correct in drawing conclusions about characteristics of compounded distribution of iid -- is non-practical for making prediction (and the author admits it in the footnotes). If you have an array arr of returns, np.prod(1+arr) in general will never equal (1+np.mean(arr))**T. Think e.g. of returns [1.1, .9]. By compounding you'll get .99 and by averaging and then compounding you'll get (1)**T. Increasing number of time periods won't help, because ... for this to work return must be additive, this is why log returns.

Log returns here also non-normal (or almost normal), but they are additive, and adding [almost] normal distributions produces normal with known qualities, and switching back is not exponentiating, rather it's using formulas linking log transformed returns and original returns. This produces asymptotically correct projections:

# what are our predictions for original returns
# inferred from distribution of logtransformed returns
np.random.seed(42)
T = 2520
ret = np.random.randn(T)/100
N = 100000
def lognstat(mu, sigma):
    """Calculate the mean of and variance of the lognormal distribution given
    the mean (`mu`) and standard deviation (`sigma`), of the associated normal 
    distribution."""
    m = np.exp(mu + sigma**2 / 2.0)
    v = np.exp(2 * mu + sigma**2) * (np.exp(sigma**2) - 1)
    return m, np.sqrt(v)

log_ret = np.log(1+ret)
lognstat(logret.mean()*252,logret.std()*np.sqrt(252))
#(1.0864463524969215, 0.17062029188382657)

vs expectations:

# draw yearly returns 100000
# assess if we are correct on average

N = 100000
arr = np.zeros(N)
for i in range(N):
    arr[i] = np.prod(np.random.choice(ret, size=252)+1)

arr.mean(), arr.std()
#(1.0867431699668038, 0.17078777666715064)

Note, the values from lognstat are expected values inferred from compounded logreturns. They can be off for a single year, i.e. they are not guaranteed to coincide with what is observed, rather they are asymptotically correct.

could you wrap and deliver it by way of a PR?

If you agree with the above -- I mean showing realized characteristics in an efficient and theoretically correct way -- yes.

PS

I thought non-normality came from 2008-2009, events of which should be modeled under different assumptions and models, but it turned out every year is 5-10% non-normal:

image and log-transformation doesn't help at all:

image