matheusfacure / python-causality-handbook

Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.
https://matheusfacure.github.io/python-causality-handbook/landing-page.html
MIT License
2.65k stars 463 forks source link

Incorrect Code in Chapter 20 (and theoretical nitpicking) #402

Open aliquod opened 1 month ago

aliquod commented 1 month ago

First of all, thank you for making this very accessible book!

In the section about continuous treatment in chapter 20, you defined

Y^*_i := (Y_i- \bar{Y})\dfrac{(T_i - M(T_i))}{(T_i - M(T_i))^2}

to be the pseudo-outcome[^1] and then you threw away the denominator since you are interested in comparing treatment effects, not their absolute values. But doing so does not preserve order[^2]. Instead why don't we just simplify it to be

Y^*_i = \dfrac{Y_i- \bar{Y}}{T_i - M(T_i)}?

Now onto the actual issue: the code block that came after

Y^*_i = (Y_i- \bar{Y})(T_i - M(T_i))

is

y_star_cont = (train["price"] - train["price"].mean()
               *train["sales"] - train["sales"].mean())

but this is missing some parentheses, so it actually computes

Y^*_i \overset{???}{=} Y_i- (\bar{Y} \times T_i) - M(T_i).

[^1]: The denominator I assume is an estimate of the conditional variance Var(T|X), but for most regression methods this residual is an underestimate. [^2]: In the end we will average those values up to estimate the CATE. But unlike the randomized treatment case where every term is scaled by σ² and can be un-scaled without changing order, here each term has a different factor.