quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.33k stars 1.14k forks source link

factor_returns should shift returns by one day #277

Closed twiecki closed 6 years ago

twiecki commented 6 years ago

The format in alphalens is: signalt <> return(t+T). In words, the signal on day t is linked to the returns of that asset on day t+T (e.g. 1). factor_returns computes the return series of that signal by multiplying signal_t * return_(t+T) and giving that timestamp t. However, I'm arguing it should be timestamp t+T because only then is the return actually realized.

luca-s commented 6 years ago

I understand your point but I don't believe it's a bug, I would call it a convention: the returns are reported at the timestamp at which the transaction happens (when the factor is traded, and new long or short positions are entered). To know when the return is actually realized we have to add to this timestamp the timedelta corresponding to the period for which the returns are computed for (1 hour, 1 day etc.) The function that computes the cumulative returns behaves consistently with this convention, It doesn't seem to be buggy.

Of course we can change this convention if we like, but I believe this decision has a historical reason. The main Alphalens use case is to analyze factors computed at market open and the daily returns are better shown on the same day the factor is traded because the returns accumulate during the trading day. Reporting the returns the following day, when the returns are actually realized, would seem strange. If the common scenario was to analyze factors computed just before market close, I believe a different convention would have been chosen.

twiecki commented 6 years ago

I see what you're saying but I don't think that convention makes sense. Why would that be the way to think about factor returns? I suppose the question is where this is used but do you agree that for generating pyfolio returns it should definitely be shifted?

luca-s commented 6 years ago

Ah right, the problem is passing the factor returns to pyfolio. I understand there are special cases (analysis of a daily factor computed at market open and 1 day period) where you would like to pass the factor returns to pyfolio untouched but in Alphalens there is some computation happening on factor returns to make them suitable for pyfolio.

Since the factor returns properties can vary (period length, frequency, overlapping or not, time at which they are computed) and since pyfolio wants daily returns only, Alphalens computes the following transformation:

So you can call performance.create_pyfolio_input which does all that for you or:

# compute cumulative returns
cumrets = perf.cumulative_returns(factor_returns[period], period)
# resample at 1 day frequency
cumrets = cumrets.resample('1D').last().dropna()
# cumulative to daily returns
returns = cumrets.pct_change().fillna(0)

Bottom line, if you still want to change the convention I am fine with that (it might be more intuitive) but I don't believe is a good idea to pass factor returns directly to pyfolio since Alphalens can take care of the details required to adapt them.

luca-s commented 6 years ago

The more I think about this the more I believe you are right, the current convention is really counter-intuitive: factor returns should use the timestamp at which they are realized. Still I don't believe there are bugs (at least not obvious ones) as the code seems to handle the factor returns correctly. I'll have a look if I can fix the factor returns timestamps without breaking the code :)

luca-s commented 6 years ago

@twiecki I had a look at the code and it's not that easy to change the behavior of factor_returns. What about adding an argument to performance.factor_returns like this:

realized_ts: bool, optional
   By default the returns use the timestamps of the factor values for which they are computed for.
   If `realized_ts` is True the returns will use the timestamps at which the returns are actually realized
   (Suggestions for a better description  are welcome :) )
twiecki commented 6 years ago

What makes it not easy to change? The rest of the code-base relying on this way of setting the dt?

luca-s commented 6 years ago

Yes. Throughout Alphalens code the returns are aligned to the factor index, exactly as it happens with the forward returns in factor_data:

           -------------------------------------------------------------------
                      |       | 1D  | 5D  | 10D  |factor|group|factor_quantile
           -------------------------------------------------------------------
               date   | asset |     |     |      |      |     |
           -------------------------------------------------------------------
                      | AAPL  | 0.09|-0.01|-0.079|  0.5 |  G1 |      3
                      --------------------------------------------------------
                      | BA    | 0.02| 0.06| 0.020| -1.1 |  G2 |      5
                      --------------------------------------------------------
           2014-01-01 | CMG   | 0.03| 0.09| 0.036|  1.7 |  G2 |      1
                      --------------------------------------------------------
                      | DAL   |-0.02|-0.06|-0.029| -0.1 |  G3 |      5
                      --------------------------------------------------------
                      | LULU  |-0.03| 0.05|-0.009|  2.7 |  G1 |      2
                      --------------------------------------------------------
    """
twiecki commented 6 years ago

Isn't there a difference between forward-returns (which need to be tied to the signal dt) and factor returns (which we plan to change to tie to the return dt)? Or does factor_returns get used in other parts there as well?

luca-s commented 6 years ago

The latter, factor_returns does get used in other parts of the code.

twiecki commented 6 years ago

I see. Yes, then we should probably add a kwarg realized_ts.

luca-s commented 6 years ago

@twiecki I had a look at the code in detail and I realized it is not even possible to add the realized_ts flag. perf.factor_returns function computes the returns for all period in factor_data and, since each period has a different realized return time for the same factor time, it would not be possible to store the factor returns for all periods in the same index. At least, it wouldn't be clean.

By the way, this also explains the choice of using the factor index as the factor return index instead of the realized return timestamps.

twiecki commented 6 years ago

Right, each period would need to be shifted by that period length. That would definitely jumble up the periods but I don't see a problem with that per se. Say we have 1D and 5D signal, we would shift the returns by 1 and 5 days, respectively. That's exactly how it would look if you would actually trade that signal.

luca-s commented 6 years ago

But what about a 3h and 1D signal? How could you combine them with the same index?

luca-s commented 6 years ago

@twiecki can we close this? I believe we should create a new API if we need something that cannot be achieved with the current perf.factor_returns implementation. Due to the reasons in this thread we cannot change the function behaviour.