Make input preparation more robust and less annoying

luca-s commented 6 years ago

Dropping factor data without warning the user can incur in wrong results interpretation. For this reason utils.get_clean_factor_and_forward_returns has now a new parameter 'max_loss' that controls the maximum percentage of factor data that can be dropped due to being flawed itself (e.g. NaNs), not having provided enough price data to compute forward returns for all factor values, or due to binning errors.

Also, small errors in the binning phase (utils.quantize_factor) caused by sporadic flawed data don't raise exceptions anymore if the incurred data loss is less than 'max_loss'

luca-s commented 6 years ago

@twiecki max_loss is configured to allow max 5% of data loss by default: is that too strict?

luca-s commented 6 years ago

I don't believe there is anything controversial in this commit, it's a pretty straightforward change. I'll merge this and we can decide later on what is the best default value for max_loss, for now It is 5%

luca-s commented 6 years ago

@twiecki I fixed the issue of max_loss being too strict in this PR #210

quantopian / alphalens

Make input preparation more robust and less annoying #207