quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.2k stars 1.12k forks source link

Possible Bug in Mean Return by Quantile? #340

Open metin-akyol opened 5 years ago

metin-akyol commented 5 years ago

Problem Description

I am having some issues computing mean returns by quantile with the example of alphalens that is provided in the Quantopian Tutorial (Lesson 4):

https://www.quantopian.com/tutorials/getting-started#lesson4

In particular, the mean_return by quantile function yields different results than if I use Pandas groupby function.


I have created a min example to illustrate the problem

`import pandas as pd
import alphalens as al

dict1 = [
        {'ticker':'jpm','date': '2016-11-29','1D': 0,'factor_quantile': 1},
{ 'ticker':'ge','date': '2016-11-29','1D': 0,'factor_quantile': 1},
{'ticker':'fb', 'date': '2016-11-29','1D': 0,'factor_quantile': 1},
{'ticker':'aapl', 'date': '2016-11-29','1D': 3,'factor_quantile': 2},
{'ticker':'msft','date': '2016-11-29','1D': 3,'factor_quantile': 2},
{'ticker':'amzn','date': '2016-11-29','1D': 3,'factor_quantile': 2},
{'ticker':'jpm','date': '2016-11-30','1D': 0,'factor_quantile': 1},
{'ticker':'ge', 'date': '2016-11-30','1D':0,'factor_quantile': 1},
{'ticker':'fb','date': '2016-11-30','1D': 0,'factor_quantile': 1},
{'ticker':'aapl','date': '2016-11-30','1D': 3,'factor_quantile': 2},
{'ticker':'msft','date': '2016-11-30','1D': 3,'factor_quantile': 2},
{'ticker':'amzn','date': '2016-11-30','1D': 3,'factor_quantile': 2}
]
df1 = pd.DataFrame(dict1)
factor_data=df1.set_index(['date',"ticker"], drop=True) 

# These two functions should produce the same mean by quantile 
factor_data.groupby(['factor_quantile']).mean()

mean_return_by_q, std_err_by_q = al.performance.mean_return_by_quantile(factor_data)
print(mean_return_by_q)
`
hahaws commented 4 years ago

factor_data.groupby(['factor_quantile']).mean() should equal to al.performance.mean_return_by_quantile(factor_data, demeaned=False)

in performance.py

elif demeaned:
        factor_data = utils.demean_forward_returns(factor_data)
    else:
        factor_data = factor_data.copy()

the parameter demeaned default True