nubank / fklearn

fklearn: Functional Machine Learning
Apache License 2.0
1.51k stars 165 forks source link

Add confidence interval causal curves #231

Open MarianaBlaz opened 1 year ago

MarianaBlaz commented 1 year ago

Status

READY

Todo list

Background context

We decided to include a way to calculate the errors of the Cumulative Effect and Cumulative Gain Curves following the example presented in Causal Inference for the Brave and the True.

Description of the changes proposed in the pull request

We add a new type called error_fn which intends to be a general class of statistical error functions, we implement one function of this kind which is the linear_standard_error. We use this function to generate a curve function: cumulative_statistical_error_curve, analogous to cumulative_gain_curve and cumulative_effect_curve, that calculates the error (given by the error_fn) among a treatment and an outcome taking into account incremental pieces of an ordered dataframe. At the end we modify the effect_curves function to add an optional parameter in case one wishes to calculate the error of the cumulative gain curve and the cumulative effect curve. These error columns are intended to be used to generate Confidence Intervals of these curves.

Where should the reviewer start?

We suggest to start from the causal/validation/curves.py file, then check the causal/statistical_errors.py file.

Remaining problems or questions

We only wrote a function for linear relationships but we believe we did a general enough approach so it can be extended to other kinds of relationships.