quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.2k stars 1.12k forks source link

compute forward returns using datetime offset #315

Closed HereticSK closed 5 years ago

HereticSK commented 5 years ago

Currently, It requires a lot of work for compute_forward_returns to compute the correct timedelta. As discussed in issue 253, the problem is that returns are computed in terms of row location offset instead of the actual datetime offset.

If datetime offsets can be directly passed to compute_forward_returns, the function can be simplified a lot. Returns can be computed as pct_change(period, freq=freq) and there's no need to infer the trading calendar. Also, compute_forward_returns can completely decouple from factor, so that the computed returns can be reused with other factors.

luca-s commented 5 years ago

Just to understand better, do you see the datetime offsets passed to Alphalens by the user? That is, you want to change the API?

luca-s commented 5 years ago

Also, could this be better addressed in issue #276 ? Or don't you see overlapping?

twiecki commented 5 years ago

@HereticSK That's an interesting idea and we'd certainly consider a PR making that change.

HereticSK commented 5 years ago

Just to understand better, do you see the datetime offsets passed to Alphalens by the user? That is, you want to change the API?

The API may have to change. But I am not sure if there's a better way. It's just that the current mechanism of infering trading calendar is a bit tricky and rather difficult to understand. The best way I can come up with is to compute forward returns as pct_change(period, freq=freq). The freq can be obtained by:

  1. having the user directly pass datetime offset objects. This leads to an API change;
  2. having the user pass some strings representing datetime offsets, such as '1d', '2m', then parsing them into datetime offset objects. This also leads to an API change, but a little bit user friendly.
  3. trying to obtain a freq from something like prices.index.freq, and accepting integers from the period arugment as proxies of datetime offsets (as the current API does). In this approach, the API does not have to change. But a combination of heterogeneous offsets, such as ('1d, '2m') is not allowed.

BTW, the periods argument in the current API seem a bit confusing. If I under stand correctly, the current infering mechanism may result in an unpredictalbe combination of datetime offsets, depending on the trading calendar, especially in a intra day context (as @luca-s discussed in issue 253. The periods argumnt suggests that it controls the time delta, but it doesn't, at least not solely.

Also, could this be better addressed in issue #276 ? Or don't you see overlapping?

276 involves refactoring not only the structure of forward returns but also factors. It involves much more design effort and bigger change to the whole project. For the moment I haven't got a good idea, yet. If we've got a good plan, I'd be happy to work on #276 directly.

But computing forward returns may run into this datetime offset issue anyway. I guess what we've achieved on this issue would become part of the solution to the bigger issue #276 .

luca-s commented 5 years ago

@HereticSK Please consider that it is more complex than it seems. You wouldn't be able to simply call pct_change(period, freq=freq) with a frequency like '1D', '1W' etc. You also want to provide a trading calendar, which can be represented by a pandas DateOffset. When we want to compute 1 day forward return we actually mean 1 day considering the trading calendar: 1 day forward return computed on Friday will be a 3 days forward return, because there is no trading on Saturday or Sunday (at least on some stock exchanges, but Alphalens is flexible enough to deal with any calendar). For this reason Alphalens stores in the factor_data columns the pd.Timedelta, which represent the forward return interval and in the factor_data index the trading calendar information (pandas DateOffset). Also the code is very flexible and don't ask the user for those details, it infers all the information from the data passed to Alphalens, keeping the API user friendly.

One last consideration. I wouldn't break the API for now (use your option 3).

luca-s commented 5 years ago

I am closing this due to inactivity