Closed julioasotodv closed 3 months ago
Thanks for reaching out, but this behavior is by design - with a good first stage model_y, the Y residuals should average to (approximately) zero (or else the model would be improved by adding that average), so there would be no point in adding an intercept that is not interacted with the T residuals. Likewise, we don't include the columns of X in the final regression either, which should also have been handled by the first-stage Y model, just the interaction of the featurized Xs (plus the intercept, if enabled) with the T residuals, because we are assuming that the form of the CATE can be expressed as a linear combination of those terms (see the DML section here).
I see, very clear. Thank you!
Actually, I just saw that the same is done for econml.dml.NonParamDML
. Given that there is no restriction on how model_final
works in this case, is it still required to compute the cross product between T
and X
? Given than model_final
can be a non-linear scikit-learn estimator (such as a Gradient Boosting Regressor), I believe this is done just to keep API homogeneity, right?
Thank you!
Actually, I just saw that the same is done for
econml.dml.NonParamDML
.
What do you mean by this? NonParamDML
doesn't have a fit_cate_intercept
attribute at all, nor does it interact the features with the residuals - it just fits an arbitrary final model regressing the quotient of the residuals onto the featuized X.
Hi again,
I just checked NonParamDML
again, and it does perform the cross product... However it is done between X
and a column of 1s, so you are right: it does not affect whatsoever.
Thank you and sorry for the incovenience
Hi!
I believe I have found a bug in the
econml.dml.DML
class (and potentially others that use the same mechanism).In theory, when the
DML
class is instantiated withfit_cate_intercept=True
, it should combine:T
)X
)Into a single feature dataset to train the final (linear) model
model_final
.To better leverage the interactions between the treatment residuals
T
andX
, two-way interactions between the variables are computed (the cross-product between them).Finally, with
fit_cate_intercept=True
an additional feature with 1s should be added.Well: the issue here is that right now
fit_cate_intercept=True
adds first the feature with 1s toX
and then the cross-product betweenX
andT
is computed. Therefore: we end up withT
intercepts, and none of them is a 1s feature. This leads to multicolinearity, and on top of that no true intercept is being generated.This can be seen here. This is the function used to generate the final feature ser for
model_final
: https://github.com/py-why/EconML/blob/6219695cd1a6a0ff492a22a5585c15537d5d41a6/econml/dml/dml.py#L139-L154self._featurizer
in L142 will add the intercept feature toX
iffit_cate_intercept=True
, generatingF
. And then in L153 the cross product betweenF
andT
is computed.Thank you!