So it is the same as the R-learner but without weights. I would like to understand why this last simplification step is not made in the book, but instead you use (y-y_pred)*(t-t_pred) to only get a sign effect.
I am also wondering about the following: weighted linear regression is done by multiplying both Xand y with ws = sqrt(w) where w are the weights, i.e. Xw = X * ws, yw = y * ws such that beta = inv(Xw'Xw)*Xw'yw gives the weighted OLS coefficients. Thus with a linear final stage, the R-learner simply uses y-y_pred as outcome, and Xw as predictor matrix. I am wondering if this would not be a better approach also for non-linear final stages that don't support sampling weights i.e. using Xw to predict y-y_pred?
Thanks for creating a great resource! The suggested pseudo outcome for the continuous treatment case is
So it is the same as the R-learner but without weights. I would like to understand why this last simplification step is not made in the book, but instead you use
(y-y_pred)*(t-t_pred)
to only get a sign effect.I am also wondering about the following: weighted linear regression is done by multiplying both
X
andy
withws = sqrt(w)
wherew
are the weights, i.e.Xw = X * ws
,yw = y * ws
such thatbeta = inv(Xw'Xw)*Xw'yw
gives the weighted OLS coefficients. Thus with a linear final stage, the R-learner simply usesy-y_pred
as outcome, andXw
as predictor matrix. I am wondering if this would not be a better approach also for non-linear final stages that don't support sampling weights i.e. usingXw
to predicty-y_pred
?