quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.33k stars 1.14k forks source link

Dropped Entries from Factor Data #264

Closed niti closed 6 years ago

niti commented 6 years ago

I’m trying to use .get_clean_factor_and_forward_returns() but I’m running into an issue where all of my entries are dropped from the factor data.

My dataset only contains one asset and is structured as follows:

factor:

-----------------------------------
    date    |    asset   |
-----------------------------------
2014-01-01  |   AAPL     |   0.5
2014-01-02  |   AAPL     |   2.5
2014-01-03  |   AAPL     |   4.0

prices:

--------------------
            | AAPL |  
--------------------
   Date     |      |  
--------------------
2014-01-01  |605.12| 
--------------------
2014-01-02  |604.35| 
--------------------
2014-01-03  |607.94| 

I’ve tried using the default quantiles and also tried the binning method such as ‘bins=[-1.5, -0.5, 0.5, 1.5], quantiles=None’. I get the following errors:

Dropped 100.0% entries from factor data: 100.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).

How do you advise further debugging this? I’m not really sure why it’s dropping the entire dataset.

When we try to perform similar actions with more than one asset in the dataset, we don’t encounter these issues where factor_data is dropped either during the forward returns computations or binning phase.

In addition, we run into an issue where all our factor data is also dropped during the binning phase:

Dropped 100.0% entries from factor data: 1.1% in forward returns computation and 98.9% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).

MaxLossExceededError: max_loss (60.0%) exceeded 100.0%, consider increasing it.

It advises me to increase the max_loss to 100% which would then of course drops all my data. How do you advise that I avoid 98.9 % of the factor from being dropped in the binning phase.

luca-s commented 6 years ago

Hi @niti , the best way to debug is to copy the code from utils.get_clean_factor_and_forward_returns, run it line by line and see where the problem is.

Dropped 100.0% entries from factor data: 100.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).

As the entries are dropped in forward returns computation, that means the code is not able to find the prices for the assets. What 'periods' did you specify? Are the date indices in both factor and prices using the same time zone? Are they both tz naive? I am trying to guess a reason why Alphalens is not able to align assets with prices.

When we try to perform similar actions with more than one asset in the dataset, we don’t encounter these issues where factor_data is dropped either during the forward returns computations or binning phase.

Strange, If I had the data I would debug it myself but I guess you have to try what I suggested above, running utils.get_clean_factor_and_forward_returns line by line. Please let me know if it is an Alphalens bug.

Dropped 100.0% entries from factor data: 1.1% in forward returns computation and 98.9% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).

MaxLossExceededError: max_loss (60.0%) exceeded 100.0%, consider increasing it.

This should be easier to debug. When the error is in the binning phase that means there were exceptions thrown but they were suppressed. If you set set max_loss=0 in get_clean_factor_and_forward_returns you will see those exceptions and that would tell you where the error is

luca-s commented 6 years ago

I am closing this. Feel free to reopen it if you find some issues on Alphalens side

ihor-durnopianov commented 6 years ago

Just stumbled upon this issue. @luca-s , you are right: utils.get_clean_factor_and_forward_returns(factor, prices, max_loss=0.0) (factor and prices as shown by @niti's comment) function call reveals the source of trouble: ValueError: Bin edges must be unique raised by pd.qcut inside quantile_calc, line 132.

What it does is it groups factor by date and tries to cut every group into bins (line 165), and clearly fails, since there is no way to cut one sample (one per date) in more than one bin. But since max_loss is different from zero, no_raise is True (line 570), so exception is swallowed, quantiles for every group are set to NaN (line 158), and so the whole dataset is dropped (line 578). Line numbering is given for commit 32bad52.

Therefore, in case of one asset only, one should use utils.get_clean_factor_and_forward_returns(factor, prices, quantiles=None, bins=1) or utils.get_clean_factor_and_forward_returns(factor, prices, quantiles=None, bins=[-np.inf, np.inf])

Hope that someone will find it useful.