Open ilovecocolade opened 5 years ago
Could you please provide a minimal example that replicate the error?
Example:
# Create toy data
dates = pd.date_range("20190101", "20190105")
prices_dates = pd.date_range("20190101", "20190201")
assets = list(range(10))
factor = pd.DataFrame(index=pd.MultiIndex.from_product([dates, assets]),
data=np.random.randn(50))
prices = pd.DataFrame(index=prices_dates, columns=assets, data=np.abs(np.random.randn(32, 10)))
factor.index.set_names(["date", "asset"], inplace=True)
prices.index.rename("date", inplace=True)
# Error occurs here
al.utils.get_clean_factor_and_forward_returns(factor, prices, bins=[-2, 0, 2])
Error message:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-76-7c85abde5837> in <module>
12
13 # Error occurs here
---> 14 al.utils.get_clean_factor_and_forward_returns(factor, prices, bins=[-2, 0, 2])
/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in get_clean_factor_and_forward_returns(factor, prices, groupby, binning_by_group, quantiles, bins, periods, filter_zscore, groupby_labels, max_loss, zero_aware, cumulative_returns, compute_quantiles)
805 binning_by_group=binning_by_group,
806 max_loss=max_loss, zero_aware=zero_aware,
--> 807 compute_quantiles=compute_quantiles)
808
809 return factor_data
/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in get_clean_factor(factor, forward_returns, groupby, binning_by_group, quantiles, bins, groupby_labels, max_loss, zero_aware, compute_quantiles)
607 binning_by_group,
608 no_raise,
--> 609 zero_aware
610 )
611
/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in dec(*args, **kwargs)
73 def dec(*args, **kwargs):
74 try:
---> 75 return func(*args, **kwargs)
76 except ValueError as e:
77 if 'Bin edges must be unique' in str(e):
/usr/local/anaconda/lib/python3.7/site-packages/alphalens/utils.py in quantize_factor(factor_data, quantiles, bins, by_group, no_raise, zero_aware)
128 if not ((quantiles is not None and bins is None) or
129 (quantiles is None and bins is not None)):
--> 130 raise ValueError('Either quantiles or bins should be provided')
131
132 if zero_aware and not (isinstance(quantiles, int)
ValueError: Either quantiles or bins should be provided
As @ilovecocolade suggests, the default for quantize_factor
is probably behind this.
Hi,
Yes @luca-s, the error provided by @eigenfoo is the error I am referring to. When quantize_factor() is called it passes a default value of quantiles=5, regardless of whether bins is used instead of quantiles. The error suggests neither bins nor quantiles have been passed but it is because bins has been passed and quantiles is passed by default with a value of 5 in quantize_factor(), and only one of these two variables can have a value. This then means you must set the value of quantiles to be None. I ran into difficulties in the same place as @eigenfoo where get_clean_factor_and_forward_returns() does not have a default quantiles value but it calls quantize_factor() which does.
This behaviour was chosen due to backward compatibility, but I agree this is not user friendly. The documentation states that Only one of 'quantiles' or 'bins' can be not-None
. Earlier releases of Alphalens had quanties=5
but bins
option didn't exist.
I would be happy to change this default and force the user to provide either quantiles
or bins
(no quanties=5
by default) but I don't know if it would break something? @twiecki
Yes, I also had that thought that it may break something elsewhere. Maybe just change the error message so it is clear to users that quantiles must be set to None if using bins instead of quantiles. From my limited experience of Alphalens I've found that most of the data I have used is continuous and much better suited to quantile sorting anyway, I only ever use bins when the data is discrete which is rare for me. The best fix may be as a previously said to just change the error message to indicate to users if they want to use the bin instead of quantile sorting they should set quantiles=None.
A suitable error message might be: "Only one of either quantiles or bins should be provided. If using bins instead of quantiles; set quantiles=None."
Thank you for the prompt response.
@ilovecocolade I like your idea, changing the error message is the safest solution. Would you like to provide a PR?
Hi,
I have never created a pull request before but I would be happy to do it.
Change the default value for quantiles in this function to quantiles=None. When using bins instead of quantiles with the utils.get_clean_factor_and_forward_returns() function, there is an error in the utils.quantize_factor() inferring that no bin size or quantile sizes were passed but it is actually because utils.quantize_factor() passes quantiles = 5 by default, regardless of if bins in not None.