Closed eigenfoo closed 6 years ago
It looks like I'm relying on something that's only supported in pandas v0.20 or higher: those tests pass, but the rest don't. I can't imagine what it is, but I'll take a look.
@twiecki don't you want the zero_aware
flag to be exposed to get_clean_factor_and_forward_returns
?
@eigenfoo one suggestion. Since it is almost no effort, why don't we add the zero_aware
logic to the bins
option too? As a user I would expect the flag to work with both bins
and quantiles
@twiecki @eigenfoo I am not sure if you agree, but I would expect the binning to be symmetrical on positive and negative values. So when performing the binning on the negative values, we should compute the absolute values of those, that is:
neg_quantiles = pd.qcut(x[x < 0].abs(), _quantiles // 2, labels=False) + 1
And then the quantile numbers have to be reversed.
All good points! I'll take care of those.
@twiecki can we confirm that we want the symmetric bucketing: currently, zero is pegged to the "halfway quantile" (e.g. the 50th percentile). I like the idea of symmetric bucketing but that means we'd need to have negative quantile numbers to distinguish between positive and negative factor values.
Also, I'm concerned about the failing tests: I suspect it has something to do with changes to pd.qcut
between versions 0.19 and 0.20, although I can't really find anything. Any thoughts @luca-s?
@eigenfoo I believe you get an error because either the positive or the negative subset is empty. The test that fails is the one where the binning is performed by group, and It seems to me the groups contain either all positive or all negative values.
@eigenfoo Regarding the symmetric bucketing: I didn't mean that we have to keep the negative quantiles number, we can convert them to positive ones. Actually we have to convert them to positive ones, because I don't believe Alphalens can handle negative quantile numbers. My idea was that if you provide custom ranges for quantiles
or bins
you probably want them to be applied consistently to both positive and negative values. Now I have just realized the code wouldn't work with custom ranges.
@luca-s I don't think I follow: if the quantile numbers don't reflect the positive/negative value of the factor, then how will we distinguish between, say, an asset in the 20th positive quantile and the 20th negative quantile?
I also realize that the zero_aware
functionality does not make sense with custom quantile/bin ranges. I will add a test to make sure that zero_aware
is ignored unless quantiles
or bins
is an integer. Does that sound reasonable?
@eigenfoo perfect, you can make a check at the beginning of the function and raise a ValueError
if zero_aware
is true and quantiles
or bins
is not an integer. In this way we can forget about symmetric bucketing and keep the current logic
@luca-s All points have been resolved, except for the tests. It cannot be what you described (i.e. that either the positive or negative subset is empty), because all tests with pandas version >= 0.20 passed successfully.
Perhaps it is to do with this bug that was patched in pandas 0.20?
@eigenfoo before your latest change the bug was what I said, now that you changed the tests the bug is the one you linked. Add more data to the biased_factor and you should be able to avoid hitting the bug
What an interesting bug; nice catch @luca-s! Ready for review now.
It seems all good now, thanks @eigenfoo . I will go on and merge this
Closes #304 and #281.
This adds the functionality to compute quantile buckets separately for positive and negative signal values. This is useful if your signal/factor is centered and zero is the separation between long and short signals, respectively.
A separate parameter, called
zero_aware
, has been added to thequantize_factor
function. If True, alphalens will compute the quantiles while being aware of the positive-negative split.