quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.29k stars 1.14k forks source link

utils.quantize_factor() #350

Open ilovecocolade opened 5 years ago

ilovecocolade commented 5 years ago

Change the default value for quantiles in this function to quantiles=None. When using bins instead of quantiles with the utils.get_clean_factor_and_forward_returns() function, there is an error in the utils.quantize_factor() inferring that no bin size or quantile sizes were passed but it is actually because utils.quantize_factor() passes quantiles = 5 by default, regardless of if bins in not None.

luca-s commented 5 years ago

Could you please provide a minimal example that replicate the error?

eigenfoo commented 5 years ago

Example:

# Create toy data
dates = pd.date_range("20190101", "20190105")
prices_dates = pd.date_range("20190101", "20190201")
assets = list(range(10))

factor = pd.DataFrame(index=pd.MultiIndex.from_product([dates, assets]),
                      data=np.random.randn(50))
prices = pd.DataFrame(index=prices_dates, columns=assets, data=np.abs(np.random.randn(32, 10)))

factor.index.set_names(["date", "asset"], inplace=True)
prices.index.rename("date", inplace=True)

# Error occurs here
al.utils.get_clean_factor_and_forward_returns(factor, prices, bins=[-2, 0, 2])

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-76-7c85abde5837> in <module>
     12 
     13 # Error occurs here
---> 14 al.utils.get_clean_factor_and_forward_returns(factor, prices, bins=[-2, 0, 2])

/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in get_clean_factor_and_forward_returns(factor, prices, groupby, binning_by_group, quantiles, bins, periods, filter_zscore, groupby_labels, max_loss, zero_aware, cumulative_returns, compute_quantiles)
    805                                    binning_by_group=binning_by_group,
    806                                    max_loss=max_loss, zero_aware=zero_aware,
--> 807                                    compute_quantiles=compute_quantiles)
    808 
    809     return factor_data

/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in get_clean_factor(factor, forward_returns, groupby, binning_by_group, quantiles, bins, groupby_labels, max_loss, zero_aware, compute_quantiles)
    607             binning_by_group,
    608             no_raise,
--> 609             zero_aware
    610         )
    611 

/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in dec(*args, **kwargs)
     73     def dec(*args, **kwargs):
     74         try:
---> 75             return func(*args, **kwargs)
     76         except ValueError as e:
     77             if 'Bin edges must be unique' in str(e):

/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in quantize_factor(factor_data, quantiles, bins, by_group, no_raise, zero_aware)
    128     if not ((quantiles is not None and bins is None) or
    129             (quantiles is None and bins is not None)):
--> 130         raise ValueError('Either quantiles or bins should be provided')
    131 
    132     if zero_aware and not (isinstance(quantiles, int)

ValueError: Either quantiles or bins should be provided

As @ilovecocolade suggests, the default for quantize_factor is probably behind this.

ilovecocolade commented 5 years ago

Hi,

Yes @luca-s, the error provided by @eigenfoo is the error I am referring to. When quantize_factor() is called it passes a default value of quantiles=5, regardless of whether bins is used instead of quantiles. The error suggests neither bins nor quantiles have been passed but it is because bins has been passed and quantiles is passed by default with a value of 5 in quantize_factor(), and only one of these two variables can have a value. This then means you must set the value of quantiles to be None. I ran into difficulties in the same place as @eigenfoo where get_clean_factor_and_forward_returns() does not have a default quantiles value but it calls quantize_factor() which does.

luca-s commented 5 years ago

This behaviour was chosen due to backward compatibility, but I agree this is not user friendly. The documentation states that Only one of 'quantiles' or 'bins' can be not-None. Earlier releases of Alphalens had quanties=5 but bins option didn't exist.

I would be happy to change this default and force the user to provide either quantiles or bins (no quanties=5 by default) but I don't know if it would break something? @twiecki

ilovecocolade commented 5 years ago

Yes, I also had that thought that it may break something elsewhere. Maybe just change the error message so it is clear to users that quantiles must be set to None if using bins instead of quantiles. From my limited experience of Alphalens I've found that most of the data I have used is continuous and much better suited to quantile sorting anyway, I only ever use bins when the data is discrete which is rare for me. The best fix may be as a previously said to just change the error message to indicate to users if they want to use the bin instead of quantile sorting they should set quantiles=None.

A suitable error message might be: "Only one of either quantiles or bins should be provided. If using bins instead of quantiles; set quantiles=None."

Thank you for the prompt response.

luca-s commented 5 years ago

@ilovecocolade I like your idea, changing the error message is the safest solution. Would you like to provide a PR?

ilovecocolade commented 5 years ago

Hi,

I have never created a pull request before but I would be happy to do it.