quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.27k stars 1.13k forks source link

Error generating tear_sheet with discrete and sparse factors #231

Closed neman018 closed 6 years ago

neman018 commented 6 years ago

I just started using this library and wanted to get some advice on constructing factors values that use quarterly data that is reported on different days.

For example, here is a sample of what my factor data-set looks like: image

When I try to run the get_clean_factor_and_forward_returns() function with quantiles=None and bins=5, it seems to run with an output of "dropped 10% entries from factor data". But when I try to create_full_tear_sheet and create_event_returns_tear_sheet(), I seem to be running into some NaN issues and a number of the graphs are empty: image

I'm assuming its user error in how I generated my factor input data. Do you guys have any advice on how to pre-process my factor values if they aren't reported on the same day and with different scales?

luca-s commented 6 years ago

I believe the problem is that you have too few values per day and Alphalens ins't able to compute sensible statistics. You can see that in Quantiles Statistics table. If you can give me some additional details on what you are trying to achieve there might be a set of options that best suit your problem. Isn't it possible that an event study better suit your scenario? There is an example NB about that

neman018 commented 6 years ago

Thanks for the prompt response Woodstock :)

So my factor values are currently just an arbitrary fundamental value for a list of stocks that were reported on that date. I just tried testing this library to see what the output would be but thinking about this a little more, I should probably normalize these values so they are a percent change for that same value and that particular stock and THEN run that %change values into AlphaSense? Perhaps set the bins to [-X,Y] where X and Y are the min/max %change values of a particular fundamental value (for that particular stock)?

luca-s commented 6 years ago

EDIT: updated my reply

Yes, I believe you are right in computing the %change and run that into Alphalens. This would tell you if a %change in that fundamental value correspond a change in the future price of the assets. Also, because Alphalens compute the binning daily, you cannot use the options bins=5 but you need to specify a custom bins range so that a specific %change range is always mapped to the same bin. E.g. assuming that the normalized factor values are in range [0., 1.0] you would use the option bins=[0., 0.3, 0.7, 1.0] , which will create 3 bins, [0.,0.3], [0.3,0.7] and [0.7,1.0]

neman018 commented 6 years ago

Sounds good, I'll computer the %change and then just change bin = 1 to start.

As usual, very insightful luca-s!

luca-s commented 6 years ago

Now I am puzzled...why bin=1 ? If you set bin=1 then you don't need to make any change to your factor because all your values will be mapped to bin 1 anyway. Also, the only reason I can think of to set bin=1 is to run an event study. There is specific function for that: tears.create_event_study_tear_sheet An event study would tell you what happen on average to the assets prices after a fundamental value is set in the factor data.

neman018 commented 6 years ago

Typo there, I meant to use a min/max normalization: zi=xi−min(x)/max(x)−min(x) and then use the bin example you mentioned earlier.

luca-s commented 6 years ago

@neman018 I am closing this but feel free to reopen it if required

neman018 commented 6 years ago

So I transformed everything into pct_changes(bySecurity) and re-ran alphalens using the following bins[-1,-0.1,0.0,0.1,1]: newbinn

inferror

Furthermore, I tried your suggestion using the creawte_event_study_tear_sheet() call and it seemed to error on "too many values to unpack (expected 2)": estearsheet

Thoughts?

luca-s commented 6 years ago

The first issue is due to the small amount of values provided but without the full logs I cannot see where the error originated.

Regarding the second error: "too many values to unpack", can you tell me how you called the create_event_study_tear_sheet function? It seems like you didn't pass the right value to avgretplot argument.

neman018 commented 6 years ago

Here is the error for the first issue (I resolved the second using your suggestion):

/auto/mm_trading/venvs/venv_conda.20170303T144910/lib/python3.5/site-packages/numpy/lib/function_base.py:4016: RuntimeWarning: All-NaN slice encountered r = func(a, **kwargs)


ValueError Traceback (most recent call last)

in () ----> 1 create_full_tear_sheet(factor_data,long_short=False, group_neutral=False, by_group=False) 2 plt.show() /auto/mm_trading/venvs/venv_conda.20170303T144910/lib/python3.5/site-packages/alphalens/tears.py in call_w_context(*args, **kwargs) 82 "is now deprecated and replaced by 'group_neutral'", 83 category=DeprecationWarning, stacklevel=2) ---> 84 return func(*args, **kwargs) 85 return call_w_context 86 /auto/mm_trading/venvs/venv_conda.20170303T144910/lib/python3.5/site-packages/alphalens/plotting.py in call_w_context(*args, **kwargs) 41 with plotting_context(), axes_style(): 42 sns.despine(left=True) ---> 43 return func(*args, **kwargs) 44 else: 45 return func(*args, **kwargs) /auto/mm_trading/venvs/venv_conda.20170303T144910/lib/python3.5/site-packages/alphalens/tears.py in create_full_tear_sheet(factor_data, long_short, group_neutral, by_group) 536 group_neutral, 537 by_group, --> 538 set_context=False) 539 create_information_tear_sheet(factor_data, 540 group_neutral, /auto/mm_trading/venvs/venv_conda.20170303T144910/lib/python3.5/site-packages/alphalens/tears.py in call_w_context(*args, **kwargs) 138 " to avoid unexpected behaviour.", 139 category=DeprecationWarning, stacklevel=2) --> 140 return func(*args, **kwargs) 141 return call_w_context 142 /auto/mm_trading/venvs/venv_conda.20170303T144910/lib/python3.5/site-packages/alphalens/plotting.py in call_w_context(*args, **kwargs) 43 return func(*args, **kwargs) 44 else: ---> 45 return func(*args, **kwargs) 46 return call_w_context 47 /auto/mm_trading/venvs/venv_conda.20170303T144910/lib/python3.5/site-packages/alphalens/tears.py in create_returns_tear_sheet(factor_data, long_short, group_neutral, by_group) 365 std_err=std_spread_quant, 366 bandwidth=0.5, --> 367 ax=ax_mean_quantile_returns_spread_ts 368 ) 369 /auto/mm_trading/venvs/venv_conda.20170303T144910/lib/python3.5/site-packages/alphalens/plotting.py in plot_mean_quantile_returns_spread_time_series(mean_returns_spread, std_err, bandwidth, ax) 503 a = plot_mean_quantile_returns_spread_time_series(fr_column, 504 std_err=stdn, --> 505 ax=a) 506 ax[i] = a 507 curr_ymin, curr_ymax = a.get_ylim() /auto/mm_trading/venvs/venv_conda.20170303T144910/lib/python3.5/site-packages/alphalens/plotting.py in plot_mean_quantile_returns_spread_time_series(mean_returns_spread, std_err, bandwidth, ax) 545 xlabel='', 546 title=title, --> 547 ylim=(-ylim, ylim)) 548 ax.axhline(0.0, linestyle='-', color='black', lw=1, alpha=0.8) 549 /auto/mm_trading/anaconda/envs/py35.prod/lib/python3.5/site-packages/matplotlib/artist.py in set(self, **kwargs) 943 key=lambda x: (self._prop_order.get(x[0], 0), x[0]))) 944 --> 945 return self.update(props) 946 947 def findobj(self, match=None, include_self=True): /auto/mm_trading/anaconda/envs/py35.prod/lib/python3.5/site-packages/matplotlib/artist.py in update(self, props) 845 try: 846 ret = [_update_property(self, k, v) --> 847 for k, v in props.items()] 848 finally: 849 self.eventson = store /auto/mm_trading/anaconda/envs/py35.prod/lib/python3.5/site-packages/matplotlib/artist.py in (.0) 845 try: 846 ret = [_update_property(self, k, v) --> 847 for k, v in props.items()] 848 finally: 849 self.eventson = store /auto/mm_trading/anaconda/envs/py35.prod/lib/python3.5/site-packages/matplotlib/artist.py in _update_property(self, k, v) 839 if not callable(func): 840 raise AttributeError('Unknown property %s' % k) --> 841 return func(v) 842 843 store = self.eventson /auto/mm_trading/anaconda/envs/py35.prod/lib/python3.5/site-packages/matplotlib/axes/_base.py in set_ylim(self, bottom, top, emit, auto, **kw) 3223 bottom, top = bottom 3224 -> 3225 bottom = self._validate_converted_limits(bottom, self.convert_yunits) 3226 top = self._validate_converted_limits(top, self.convert_yunits) 3227 /auto/mm_trading/anaconda/envs/py35.prod/lib/python3.5/site-packages/matplotlib/axes/_base.py in _validate_converted_limits(self, limit, convert) 2834 (not np.isreal(converted_limit) or 2835 not np.isfinite(converted_limit))): -> 2836 raise ValueError("Axis limits cannot be NaN or Inf") 2837 return converted_limit 2838 ValueError: Axis limits cannot be NaN or Inf
luca-s commented 6 years ago

So, the problem we see is indeed due to the too few values in the factor DataFrame. Would you mind copy pasting a small subset (enough data to reproduce the error) of your factor values in here so that I can debug the issue? I don't know how many places in the code are sensitive to NaN or empty bins/quantiles but if there are not too many I'd love to fix them.