matheusfacure / python-causality-handbook

Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.
https://matheusfacure.github.io/python-causality-handbook/landing-page.html
MIT License
2.56k stars 444 forks source link

Issue on page /22-Debiased-Orthogonal-Machine-Learning.html #363

Open SebKrantz opened 8 months ago

SebKrantz commented 8 months ago

I fail to understand why in the section "Non-Scientific Double/Debiased ML" it is necessary to save the first stage models and predict with them. In adding counterfactual treatments, we are not changing any part of the covariates X which are the sole input to the first stage models. Thus the first-stage predictions are the same with or without counterfactual treatments and we don't need those models.

In addition, I don't quite understand the value of training and test splitting and the ensamble_pred() function here. If my goal is to get counterfactual predictions for all my data (which typically is the case), I would just use cross_val_predict() to get the first stage residuals (as in the section on DML) on the entire data, and then fit cross-validated final models using cv_estimate(), additionally saving the indices for each fold, and then create a predict method that uses the final-stage models and indices to create proper cross-validated final predictions for different price levels (subtracted their prediction from the first stage, which remains the same).

Pattheturtle commented 7 months ago

WIth out saving the first stage it is impossible to use non-functional form to create the measure of the probability of something occurring it is an extension of the sharp null ideaz. [image: image.png]

On Mon, Nov 13, 2023 at 4:39 AM Sebastian Krantz @.***> wrote:

I fail to understand why in the section "Non-Scientific Double/Debiased ML" it is necessary to save the first stage models and predict with them. In adding counterfactual treatments, we are not changing any part of the covariates X which are the sole input to the first stage models. Thus the first-stage predictions are the same with or without counterfactual treatments and we don't need those models.

In addition, I don't quite understand the value of training and test splitting and the ensamble_pred() function here. If my goal is to get counterfactual predictions for all my data (which typically is the case), I would just use cross_val_predict() to get the first stage residuals (as in the section on DML), and then fit cross-validated final models using cv_estimate(), additionally saving the indices for each fold, and then create a predict method that uses the final-stage models and indices to create proper cross-validated final predictions for different price levels (subtracted their prediction from the first stage, which remains the same).

— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_matheusfacure_python-2Dcausality-2Dhandbook_issues_363&d=DwMCaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=Ift2hiH908Ag4fj5J9hp2lnM-kZxMCx6yYvKqEtwQUo&m=oqjfFcslWXcnO8Mo8lEUuIqyvi_z_a0f2iV4oHEq8ZkYC-dNQXfgKvAxo_TEARle&s=PtAo57owKCQCel5PP3dYYBanFBEcGOLeaKuzVYe2F-E&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AVNI34DDYUKEZRAP2L5TWTDYEIIIJAVCNFSM6AAAAAA7JCWLX2VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4TANJYGI3TMOI&d=DwMCaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=Ift2hiH908Ag4fj5J9hp2lnM-kZxMCx6yYvKqEtwQUo&m=oqjfFcslWXcnO8Mo8lEUuIqyvi_z_a0f2iV4oHEq8ZkYC-dNQXfgKvAxo_TEARle&s=KFs_XndBmKMWF6oyfex5_WtLVsVcGYmue6zJ8WkwqWw&e= . You are receiving this because you are subscribed to this thread.Message ID: @.***>