quantopian / alphalens

Performance analysis of predictive (alpha) stock factors
http://quantopian.github.io/alphalens
Apache License 2.0
3.33k stars 1.14k forks source link

Two Factor Interaction Development: Initial Data Structure, Test Module, and Plot #259

Closed MichaelJMath closed 6 years ago

MichaelJMath commented 6 years ago

(This is a duplicate of PR #258)

This begins the development of the "factors_interaction_tearsheet" from issue #219. The goal of this pull request is to get feedback on whether this branch seems to be going in the right direction.

Description of Changes

  1. Create join_factor_with_factor_and_forward_returns function
    • Creates a function complimentary to get_clean_factor_and_forward_returns that joins an additional factor to the factor_data dataframe returned by get_clean_factor_and_forward_returns.
    • This new dataframe returned, call it "multi_factor_data", will be the core source/data structure providing the necessary data for the factors_interaction_tearsheet computations.
  2. Create an associated test module.
  3. Modify perf.mean_return_by_quantile to take an additional parameter so that it can group by multiple factor quantiles.
  4. Add first plotting function, plot_multi_factor_quantile_returns, to create an annotated heatmap of mean returns by two-factor quantile bins.
  5. Create the tears.create_factors_interaction_tear_sheet as the entry point to the multi-factor tearsheet.

Requesting Feedback

  1. Comments and suggestions on the utils.join_factor_with_factor_and_forward_returns function
    1. Should there be a wrapper that builds the multi_factor_data dataframe in one step. (i.e. wrap this function with get_clean_factor_and_forward_returns?
  2. I'm not too familiar with creating effective unit tests, so any feedback on this module is appreciated.
  3. In regards to Change 3 above:
    1. My first thought, following suggestion of @luca-s, was to create a separate performance module which would contain all functions for this sort of computation.
    2. Since the existing performance module already contains a lot of the needed functionality, I thought maybe I would create a wrapper function in this new module that added the necessary functionality.
    3. However, in perf.mean_return_by_quantile, I needed to add a parameter to this function to make it work in a clean manner. Not sure how I could have done that with a wrapper.
    4. So I guess my question is, what are the community's thoughts on how I dealt with this particular issue, and also what are thoughts on related considerations going forward?
  4. Any other comments/guidance on path of development going forward is greatly appreciated.
  5. Also, let me know if there are too many changes in this pull request for efficient/easy review.