quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.2k stars 1.12k forks source link

Biased Mean Quantile Returns for Non-Equal Bins #309

Closed MichaelJMath closed 5 years ago

MichaelJMath commented 6 years ago

I was doing a study today, and I came across a potential issue with the performance.mean_return_by_quantile logic in the returns tear sheet. I guess it is not so much an issue with the logic of that function, but more so how it is used in building the tear sheet. I think this would only be an issue if a user created bins or quantiles with a non-uniform number of observations in each bin over time. In other words, the amount of stocks in each bin changes over time.

If the number of stocks in a bin is correlated with market direction, this could bias the mean return calculation. For instance, if a factor quantile tends to have a lot of stocks when the market is going up, it will receive a higher weight on those up-days and will thus bias the mean return upwards for that factor quantiles.

Here is an example I created. Note how it affects the mean return of the spread of going long the best quantile and short the worst quantile.

I believe the mean_return_by_quantile function already has the ability to group by date, so therefore, you could calculate the mean return for each quantile and date first, and then take the mean by quantile.

Am I thinking about this right?

luca-s commented 6 years ago

Good point. I totally agree with you. To tell the truth I thought Alphalens was already working accordingly with the solution you are proposing, but unfortunately it is not.

I believe the mean_return_by_quantile function already has the ability to group by date, so therefore, you could calculate the mean return for each quantile and date first, and then take the mean by quantile.

I agree and I would do that inside mean_return_by_quantile. Currently the function computes the mean return by date only if by_date option is true but we should do that all the time. Finally we have to compute the mean return by quantile averaging the daily mean returns when by_date is false.

Do you agree?

MichaelJMath commented 6 years ago

I would think that would be the way to go unless there are somehow other use cases where you would actually want the current functionality. However, I can't really think of any cases where that would be the case off the top of my head. So yah, I agree with you.

luca-s commented 6 years ago

@MichaelJMath would you like to submit a PR? I tried working on this but I am busy on something else at the moment.

MichaelJMath commented 5 years ago

Sure, I'll try to get to it this weekend.