uber / causalml

Uplift modeling and causal inference with machine learning algorithms
Other
4.87k stars 756 forks source link

T-Learner ATE, SE calculations #783

Open ras44 opened 2 weeks ago

ras44 commented 2 weeks ago

Describe the bug Less of a bug than a question:

The T-learner takes the mean of the treatment effect te which is calculated over all subjects (so the mean over all rows of differences between each treatment group's model prediction and the control model prediction): https://github.com/uber/causalml/blob/a0315660d9b14f5d943aa688d8242eb621d2ba76/causalml/inference/meta/tlearner.py#L242-L243

However, the standard errors of the ATE are calculated relative to a filtered subset- only the subjects that are within a particular treatment group and those in the control group are included:

https://github.com/uber/causalml/blob/a0315660d9b14f5d943aa688d8242eb621d2ba76/causalml/inference/meta/tlearner.py#L254-L261

It seems like the subjects in the ATE calculation should match the subjects in the SE calculation, with the SE potentially simply just being the SE of the te measurements for all subjects, if all subjects are meant to be included in the calculation.

If all subjects are not included in the ATE calculation and the ATE calculation is group-specific, then it seems like we should have:

_ate  = (yhat_t - yhat_c).mean()

And again the SE simply being the SE of the series:

se = np.sqrt((yhat_t - yhat_c).var())