Closed sbushmanov closed 3 years ago
Curious to know why you changed your mind? The expression:
ret_ = stats._equity_curve["Equity"].resample("D").last().dropna().pct_change()
is recent. It's reasonable it might be incorrect. dropna()
is there to drop weekends and holidays where the input OHLC was undefined.
Curious to know why you changed your mind? The expression:
ret_ = stats._equity_curve["Equity"].resample("D").last().dropna().pct_change()
is recent. It's reasonable it might be incorrect.
dropna()
is there to drop weekends and holidays where the input OHLC was undefined.
def geometric_mean(returns):
returns = returns.fillna(0) + 1
return (0 if np.any(returns <= 0) else
np.exp(np.log(returns).sum() / (len(returns) or np.nan)) - 1)
day_returns = stats._equity_curve["Equity"].resample('D').last().dropna(0).pct_change()
(1+geometric_mean(day_returns))**252
#1.3161093598955005
the above is just fine and results are in line with observed data. There will be troubles if the amount of data significantly deviates from 252, but I believe it can be overcome with (1+geometric_mean(day_returns))**(len(day_returns)
, so that even if you have 3 data points you compound them properly.
Now to volatility: I rely on observed data, you seem to model logreturns, yours is ~50% higher.
I followed this paper. It contains derivations and proofs for compounded returns, if you'd care to check yourself.
Let me see the paper
What he is doing is not new and saying that compounded returns have higher risks is also true. That higher positive returns are associated with higher risks also agrees with common logic (I am not so sure about lower risks for negative returns). There is one major caveat though, where his math falls apart: going from monthly returns to annual. He is saying:
However, the type of returns that performance measurement usually deals with do not sum over time. In fact, the annual portfolio return is made up by compounding the monthly portfolio returns.
And then adding in the footnotes next page:
Note, however, that in the field of investment performance it is uncommon to annualize monthly returns
And I would add here the moment you try to annualize observed monthly returns by compounding you'll see the results are wrong (over- or under-stated against what you observe, due to compounding). You try to remedy this problem by inserting log returns, but it's not part of his paper, he is using observed monthly returns, and he is silent why his compounded annual returns are different from "simple", observable returns. His bootstrap experiment also not very useful. He might have showed that "correct" method is "better" in predicting risks, like Kaplan did before with generating risk profiles for actual stocks, but (1) this wouldn't answer which is better (compunded risk will always be higher for positive returns), let alone what's defined as "better" (2) Kaplan did this for 2008-2009 and then it was in high demand.
Back to volatility. There are 4 choices to be made:
which can boil down to either we "believe" in lognormal observed returns or normal logtransfomed returns. Of course you can substitute beliefs with statistical tests, but imo it would be more practical to leave this choice to the user. If you believe in normally distributed observable returns that are addable on intra-year basis, then multiplying by √T is just fine, if you want to become fancy [in preparation for the next crisis], you can model risk in compounded log returns. And if you choose the latter I believe more theoretically correct way is to do like I did above (model normal log-transformed and then switch back to original lognormal), which will produce correct return expectancy (like you tried to do) and theoretically correct risk estimate for lognormal.
BTW, CFA Institute is cautious on this way of risk calculations:
The author suggests that it may be more appropriate to measure the volatility of annual logarithmic return rather than level returns because annual logarithmic return is the sum of its monthly constituents, thus making multiplication by the square root of 12 appropriate. In my view, it is important for asset managers to encourage the use of mathematically sound procedures for calculating the annualized volatility measure rather than to opt for an expedient but mathematically invalid procedure.
by saying it must be taken into consideration.
I admit it's a bit out of scope for me to decide on or even debate the appropriateness of either measures. The returns are compounded based on the assumption that the trading proceeds are always reinvested, which is the case at least with default size=
parameter. Other than that, with the article formula for σa (16) derived in Appendix B, it's not at all clear to me why it'd be invalid. :grimacing:
it's not at all clear to me why it'd be invalid.
The formulae are valid indeed. It's about how to use them
If one has an iid process following 𝑁(μₘ,σₘ), then to calculate monthly compounded μₐ and σₐ one would use the formulae (18) and (19). μₘ and σₘ are monthly characteristics of the observable process, not log returns. Perhaps it's possible to use investment returns instead of using observed values (why they should be different anyway), but they should be equally spaced, and log returns have different probability distribution.
So what do you recommend? And could you wrap and deliver it by way of a PR? :smiley:
what do you recommend?
I realized we need to fix what you want to calculate (and show in the report): realized Return/Volatility (Ann.) or expected. If you want to show annualized realized characteristics (and I do believe this is what you have in mind) you can just do (equity[-1]/equity[0])**(252/len(equity))
for return and equity.pct_change().std()*√252
for realized standard deviation. Note, you do not need log transformation to infer characteristics of the observable returns, and probability distributions are also irrelevant here: both are "approximately" normal (90%+) and log transformation adds only 1% to normality:
If you want to make projections that's a different topic. The paper you borrowed formula from -- though correct in drawing conclusions about characteristics of compounded distribution of iid -- is non-practical for making prediction (and the author admits it in the footnotes). If you have an array arr
of returns, np.prod(1+arr)
in general will never equal (1+np.mean(arr))**T
. Think e.g. of returns [1.1, .9]
. By compounding you'll get .99
and by averaging and then compounding you'll get (1)**T
. Increasing number of time periods won't help, because ... for this to work return must be additive, this is why log returns.
Log returns here also non-normal (or almost normal), but they are additive, and adding [almost] normal distributions produces normal with known qualities, and switching back is not exponentiating, rather it's using formulas linking log transformed returns and original returns. This produces asymptotically correct projections:
# what are our predictions for original returns
# inferred from distribution of logtransformed returns
np.random.seed(42)
T = 2520
ret = np.random.randn(T)/100
N = 100000
def lognstat(mu, sigma):
"""Calculate the mean of and variance of the lognormal distribution given
the mean (`mu`) and standard deviation (`sigma`), of the associated normal
distribution."""
m = np.exp(mu + sigma**2 / 2.0)
v = np.exp(2 * mu + sigma**2) * (np.exp(sigma**2) - 1)
return m, np.sqrt(v)
log_ret = np.log(1+ret)
lognstat(logret.mean()*252,logret.std()*np.sqrt(252))
#(1.0864463524969215, 0.17062029188382657)
vs expectations:
# draw yearly returns 100000
# assess if we are correct on average
N = 100000
arr = np.zeros(N)
for i in range(N):
arr[i] = np.prod(np.random.choice(ret, size=252)+1)
arr.mean(), arr.std()
#(1.0867431699668038, 0.17078777666715064)
Note, the values from lognstat
are expected values inferred from compounded logreturns. They can be off for a single year, i.e. they are not guaranteed to coincide with what is observed, rather they are asymptotically correct.
could you wrap and deliver it by way of a PR?
If you agree with the above -- I mean showing realized characteristics in an efficient and theoretically correct way -- yes.
PS
I thought non-normality came from 2008-2009, events of which should be modeled under different assumptions and models, but it turned out every year is 5-10% non-normal:
and log-transformation doesn't help at all:
Background
There is general agreement stock prices are lognormal, there is less agreement if returns are lognormal or normal. Some people say short term returns are normal and
Others say returns are lognormal, and others say returns are normal in shorter terms but lognormal in longer terms (one possible evidence: they cannot go negative in the longer run).
With this is mind there are 2 possible opportunities to calculate returns and volatilities:
Let's see what we have here:
Output:
Annualized returns:
Despite the fact you seem to rely on log returns we both agree on the result.
Now to volatility:
I rely on observed data, you seem to model logreturns, yours is ~50% higher.
One more exercise: let's try to model normal logtransformed returns and then switch back to lognormal original, as suggested here or here:
Interpretation: if we believe returns are lognormal, estimate params of normal log transformed distribution, and then switch back to original returns they [original] should have mean of 38.9% and standard deviation of 46.9%. 38.9% mean is out of question because it contradicts observed annual return, 46.9 close to what you suggest for std, but not what I observe in the market.
Question:
we both seem to agree that annualized expected returns should be calculated from observable realized returns. For annualized volatility, should we use observed volatility in realized returns or infer from log transformed?