quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.33k stars 1.14k forks source link

ENH: added positions computation in 'performance.create_pyfolio_input' #250

Closed luca-s closed 6 years ago

luca-s commented 6 years ago

'performance.create_pyfolio_input' now computes positions too. Also it is now possible to select the 'period' to be used in benchmark computation and for factor returns/positions is now possible to select equal weighing instead of factor weighing.

luca-s commented 6 years ago

I noticed th pyfolio Exposure plot is empty. I am not sure if it is a pyfolio bug or if the data misses something. The positions are computed as percentage and the cash too, is this correct?

e.g.

asset | A | B | C | D | E | F | cash
-- | -- | -- | -- | -- | -- | -- | --

0.125 | 0.3750 | -0.125000 | -0.375 | 0.0000 | 0.000000 | 1.0
0.125 | 0.1875 | -0.062500 | -0.375 | 0.1875 | -0.062500 | 1.0
0.125 | 0.2500 | -0.083333 | -0.375 | 0.1250 | -0.041667 | 1.0
0.125 | 0.2500 | -0.083333 | -0.375 | 0.1250 | -0.041667 | 1.0
0.125 | 0.3750 | -0.125000 | -0.375 | 0.0000 | 0.000000 | 1.0
0.125 | 0.3750 | -0.125000 | -0.375 | 0.0000 | 0.000000 | 1.0
0.125 | 0.2500 | -0.083333 | -0.375 | 0.1250 | -0.041667 | 1.0
luca-s commented 6 years ago

If anybody has a better name for the new API (or the new internal functions) please let me know because I am not so happy about them but I couldn't think of anything better. I also wonder if performance is the right place for create_pyfolio_input, or if it would be better inside utils or tears.

luca-s commented 6 years ago

I've just realized that positions must be in dollars. Only pyfolio'tears.create_perf_attrib_tear_sheet accepts positions both in dollars or percentages. That's a pity, I have to fix the positions computation then.

mmargenot commented 6 years ago

I think that utils might make more sense. A alphalens.tears.create_perf_attrib_tear_sheet might be a good wrapper for create_pyfolio_input -> pyfolio.tears.create_perf_attrib_tear_sheet, though.

luca-s commented 6 years ago

I like the idea of moving create_pyfolio_input to utils.

I thought about a wrapper too but I discarded the idea because it doesn't add anything useful and also we would have to keep updating the alphalens API to reflect the changes that happens on pyfolio. More importantly I don't like the idea of hiding pyfolio calls as it is interesting for the user to understand what functionality is called so that they can customize the calls for their needs (there are so many parameters in pyfolio tears functions). Let's see if calling pyfolio becomes more difficult in the future but as long as it is as simple as now we can keep it the way it is. What do you think?

mmargenot commented 6 years ago

That makes sense to me. It's a case of doing the whole performance attribution in two lines vs. one line, which I think is okay to leave as two for now.

luca-s commented 6 years ago

@twiecki it is ready to be reviewed. Positions are now compute as dollar amount instead of percentage. Actually Pyfolio results are identical to before so I wonder if we could stick to percentage positions as I like them more and also the users wouldn't be forced to provide an initial capital in create_pyfolio_input just to transform the positions from percentage to dollar amount

twiecki commented 6 years ago

Really excited about this. But looking at the NB wondering if there is a bug, e.g.: image

image

image

luca-s commented 6 years ago

I believe that's correct. This is the date when ES is the only short position in the portfolio:

image

Looking at the factor values for that date we can see that 'ES' has a factor value 3 order of magnitude bigger than the other values. This should explain what we are seeing

image

twiecki commented 6 years ago

But shouldn't the logic just select the top and bottom n stocks? Seems like it's weighting by alpha signal.

luca-s commented 6 years ago

So the point of confusion is that we are asking for a portfolio that has these characteristics: long_short=True, equal_weight=True, quantiles=[1,5] and the user would expect to have long positions on quantile 5 and short positions on quantile 1, while the simulated portfolio contains only one short position.

The problem is that the code demeans the factor values and go long on the positive ones and short on the negative ones and then it computes equal weights. This is the cause of the confusion.

I need to think again about this behavior and makes sure it doesn't end up with this kind of inconsistencies. Thank you for spotting this out, I love when the bugs are found right away :)

twiecki commented 6 years ago

The problem is that the code demeans the factor values and go long on the positive ones and short on the negative ones and then it computes equal weights. This is the cause of the confusion.

Not sure I understand yet what the problem really is. Shouldn't the 1 and 5 quantile have roughly the same number of stocks despite what weighting is used?

luca-s commented 6 years ago

Yes and I will fix that, it has to work as you say. I believe I was looking at the equal weighting from the wrong point of view. I used the factor values to decide what assets should be long and what short and then I compute the equal weighting. I actually have to use the quantile information to decide what asset should be in the short positions and what in the long ones.

twiecki commented 6 years ago

One other idea for the future would be the ability to supply a custom weighting function. E.g. could see the case for equal weight, alpha weighted, inv vol etc.

luca-s commented 6 years ago

The issue with the long/short weights should be fixed now and NB updated too.

By the way, the change I made to the weights computation is that factor values above the median become long positions, while factor values below the median become short positions. The previous behaviour was very similar except I used the mean instead of the median, that's why the extremely huge negative factor value for ES made it to be the only short position

luca-s commented 6 years ago

I also found the reason of the Exposure plot being blank, it turned out to be a pyfolio bug.

twiecki commented 6 years ago

By the way, the change I made to the weights computation is that factor values above the median become long positions, while factor values below the median become short positions. The previous behaviour was very similar except I used the mean instead of the median, that's why the extremely huge negative factor value for ES made it to be the only short position

That makes sense. Ideally I think that would be configurable as well, e.g.: lower_percentile and upper_percentile.

twiecki commented 6 years ago

Also, seems like sub-sampling doesn't quite do what we want: ideally we wouldn't exit the positions but hold them for the whole week. I suppose one would need to have the same signal for all days in the week to achieve that.

luca-s commented 6 years ago

That makes sense. Ideally I think that would be configurable as well, e.g.: lower_percentile and upper_percentile.

For now it's possible to choose which quantile to use, eventually we can add the percentile option if the quantile configuration is not flexible enough.

Also, seems like sub-sampling doesn't quite do what we want: ideally we wouldn't exit the positions but hold them for the whole week. I suppose one would need to have the same signal for all days in the week to achieve that.

The portfolio is holding the positions for 1 day because the code calls create_pyfolio_input(period='1D', ... ). If we switched period to '5D', which is one of the periods computed by get_clean_factor_and_forward_returns, the position would be held for 5 days. The reason I used '1D' is I didn't find a good example to use the 5 days period. I didn't want to give the misleading idea that there is a good reason to trade a 5 days signal every 5 days.

I believe that rebalancing every 5 days is not the best way of trading a 5 days signal. A better way to do that would be to trade 1/5 th of the portfolio every subsequent day and rebalance each 1/5th portfolio every 5 days. This would result in the same transaction cost, but the slippage impact would be 1/5th, the portfolio capacity would be 5 times bigger, the volatility of the portfolio would be lower, the factor would be traded every single day making it more statistically robust and independent of the starting day.

I can still modify the NB to show the usage of 5 days period traded every Monday, except I need a good excuse to show that.

twiecki commented 6 years ago

If the quantile is already used, where does the median (or mean) value come in when building the portfolio?

The holding period question is tricky indeed. Although I think trading a 5-day signal every 5 days is a pretty simple method to go with as a default.

luca-s commented 6 years ago

If the quantile is already used, where does the median (or mean) value come in when building the portfolio?

I am not sure I understand your question. This is how I implemented it: the option quantiles of create_pyfolio_input function selects the quantiles that will be used in the portfolio. The assets belonging to those selected quantiles become long positions if their factor values are above the median and short positions otherwise. This ensures the same number of assets in long and short positions. The user can choose what quantiles to use to increase or decrease the number of assets traded (e.g. quantiles[1,5] vs quantiles=[1,2,4,5]). That's not exactly how choosing the percentile but it's something.

The holding period question is tricky indeed. Although I think trading a 5-day signal every 5 days is a pretty simple method to go with as a default.

Ok then, I'll update the NB.

twiecki commented 6 years ago

Oh I see. So first you select whatever quantiles the user specified, e.g. [1, 2, 5] (which makes no sense) and then you do a median split inside that selection. So the algo would go long from 2.5 to 5 and short on 1 to 2.5 (there are no 2.5s but it's based on the actual values). Correct?

luca-s commented 6 years ago

Exactly but please let me know if you have a better idea. Eventually I'd like to add your idea of a custom weighting function though, so the users can do what they like

twiecki commented 6 years ago

OK, that makes sense. An alternative would be to require the user to specify long_quantiles=[4, 5], short_quantiles=[1, 2] to make it explicit. Although I think the current one is simpler and probably foolproof as well.

NB looks great too. I will try to review or find someone to review the code in more detail.

@richafrank Do you know of someone who could help review this new feature?

luca-s commented 6 years ago

Making the long/short quantiles explicit would be nicer but then we would still need the quantiles option to handle the factor weighted scenario, where the factor value implies the long/short positions.So to avoid the proliferation of too many function arguments I chose this path.

twiecki commented 6 years ago

Ping @richafrank.

richafrank commented 6 years ago

Thanks for the ping. Sorry I lost track of this. Will find someone!

twiecki commented 6 years ago

Waiting on @prsutherland's sign-off before merging.

luca-s commented 6 years ago

@prsutherland any more comments on this PR?

twiecki commented 6 years ago

OK, I think this went through some solid review. Going to merge this -- really cool feature @luca-s!