Closed emaadmanzoor closed 4 years ago
Did you get an answer for this? I have a binary outcome problem as well want to know how to calculate residual.
I ended up just implementing the partially linear double ML IV estimator from (Chernozukov et. al.). It does not assume any distributional form for the noise/error term, so I think it could be used as is for binary outcomes. My simulations seem to confirm this.
Are you referring to this?
Y =Dθ0 +g0(X)+U, EP[U |X,Z]=0, (4.56) Z =m0(X)+V, EP[V |X]=0,
First off all, mine problem doesn’t have a Z. But even if I do, I still need to estimate U which require some type of Y - Y_hat subtraction. If Y is binary, we have the trouble of {0, 1} - Prob. How did get around it?
On Thu, May 14, 2020 at 10:19 AM Emaad Ahmed Manzoor < notifications@github.com> wrote:
I ended up just implementing the partially linear double ML IV estimator from (Chernozukov et. al.). It does not assume any distributional form for the noise/error term, so I think it could be used as is for binary outcomes. My simulations seem to confirm this.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/microsoft/EconML/issues/204#issuecomment-628775146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6JOMCVNCMTT2Q25KRXOV3RRQR2JANCNFSM4J5ZNZ5Q .
--
Ken Lee
Sorry I confused this issue with a different one. I think econml implements this correctly for binary outcomes (there are no changes required), and I was interpreting the results incorrectly.
The DMLCateEstimator
will estimate average marginal effects. I compared these with the logit coefficients, which are the log odds ratios and not the marginal effects.
The following simulation confirms that DMLCateEstimator
works as expected:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from econml.dml import DMLCateEstimator
from sklearn.linear_model import LinearRegression, LogisticRegression
def sigmoid(x):
return 1.0/(1.0 + np.exp(-x))
logit_margeff_x1 = []
logit_margeff_x2 = []
logit_coef_x1 = []
logit_coef_x2 = []
dml_x1 = []
dml_x2 = []
for sim in tqdm(range(1000)):
data = []
for row in range(1000):
x1 = np.random.normal(5.0, 1.0)
x2 = np.random.normal(2.0, 1.0)
Y_prob = sigmoid(-4.0 + 0.5 * x1 + 0.25 * x2)
Y = np.random.choice([0, 1], p=[1-Y_prob, Y_prob])
data.append([Y, x1, x2])
data = pd.DataFrame(data, columns=["Y", "x1", "x2"])
# logit average marginal effects
logit_model = smf.logit("Y ~ x1 + x2", data=data)
logit_results = logit_model.fit(disp=0)
x1_coef, x2_coef= logit_results.params.x1, logit_results.params.x2
x1_margeff, x2_margeff = logit_results.get_margeff().margeff
logit_margeff_x1.append(x1_margeff)
logit_margeff_x2.append(x2_margeff)
logit_coef_x1.append(x1_coef)
logit_coef_x2.append(x2_coef)
# dml estimates
est = DMLCateEstimator(model_y=LogisticRegression(penalty="none", solver="lbfgs"),
model_t=LinearRegression(),
linear_first_stages=False,
model_final=LinearRegression(fit_intercept=False),
fit_cate_intercept=True,
n_splits=4)
est.fit(Y=data["Y"].values, T=data[["x1", "x2"]].values, W=None, inference=None)
te_pred = est.const_marginal_effect()[0]
dml_x1.append(te_pred[0])
dml_x2.append(te_pred[1])
plt.hist(logit_coef_x1, bins=25)
plt.grid()
plt.title("logit coef x1")
plt.show()
plt.hist(logit_coef_x2, bins=25)
plt.grid()
plt.title("logit coef x2")
plt.show()
plt.hist(logit_margeff_x1, bins=25)
plt.grid()
plt.title("logit margeff x1")
plt.show()
plt.hist(logit_margeff_x2, bins=25)
plt.grid()
plt.title("logit margeff x2")
plt.show()
plt.hist(dml_x1, bins=25)
plt.grid()
plt.title("dml x1")
plt.show()
plt.hist(dml_x2, bins=25)
plt.grid()
plt.title("dml x2")
plt.show()
The distributions of the logit coefficients is as expected:
The distributions of the logit average marginal effects are what DMLCateEstimator
should converge to:
The DMLCateEstimator
estimates do indeed converge to the logit average marginal effects:
Thanks @emaadmanzoor and @realkenlee for the discussion!
1) The easiest way to handle binary outcome if you want to use a classifer for E[Y|X], would be to build a regression wrapper and pass it as model_y: i.e. some example code
from econml.dml import LinearDMLCateEstimator
from sklearn.linear_model import LogisticRegression
class RegWrapper:
def __init__(self, classifier):
self.classifier = classifier
def fit(self, X, y, **kwargs):
return self.classifier.fit(X, y, **kwargs)
def predict(self, X):
return self.classifier.predict_proba(X)[:, 1]
est = LinearDMLCateEstimator(model_y=RegWrapper(LogisticRegression()),
model_t=LogisticRegression(),
discrete_treatment=True)
n = 1000
X = np.random.uniform(-1, 1, size=(n, 1))
D = np.random.binomial(1, .5 + .1*X[:, 0], size=(n,))
Y = np.random.binomial(1, .5 + .2*D + .1*X[:, 0], size=(n,))
est.fit(Y, D, W=X, inference='statsmodels')
est.summary()
If you want to get more fancy and expose all the attributes of the classifier as attributes of the wrapper, you can use this version we implemented in our dmliv prototype: https://github.com/microsoft/EconML/blob/3606b0bcc7779b78e6df8991dbcd7b72ac3046ef/prototypes/dml_iv/utilities.py#L117 With this version you can access the coef_ of the underlying classifier as:
est = RegWrapper(LogisticRegression()).fit(X, y)
est.coef_
while with the simpler version above you would need to do:
est.classifier.coef_
2) It's not clear when the above approach would be theoretically sound (though I suspect it might be ok in practice). Observe that the theory requires that the model is linear in D, i.e.:
Y = D*theta(X) + g(X) + epsilon
D = p(X) + eta
So
E[Y | X] = p(X) * theta(X) + g(X)
However, logistic regression assumes that:
E[Y | X] = logistic( <q, X>)
It's not clear that there are natural primitive assumptions on p
, theta
and g
, so that you can write the former as the latter. For instance, if theta(X)=theta
is a constant, then the above assumes that: p(X)+g(X) = logistic(<q, X>)
, whic means that somehow the propensity p(X)
is related with the confounding effect g(X)
via the latter relationship. So from a theoretical perspective it might just be better to just use a linear probability model for Y
(i.e. just use a lasso for model_y even if Y is binary).
For a binary outcome, it is more reasonable to assume the model:
E[Y | D, X] = logistic( D* theta(X) + g(X) )
D = p(X) + eta
But this is a non-linear equation for Y
and requires different theoretical arguments for orthogonalization. For instance check out Section 2.1, p. 16 of: https://arxiv.org/pdf/1806.04823.pdf where we derive orthogonal moments for such a non-linear logistic Y model. This is not implemented in our library.
Thanks! So let me rewrite it in this way.
E[Y | D, X] = logistic( D* theta(X) + g(X) + U ) D = p(X) + V
Where E[U|X,D] = 0 and E [V | X] = 0
Estimating p should be simple regression problem. How should one estimate g ? What is the right approach here? More concretely, how do one ultimately get to “residual U” so we can manipulate it downstream?
On Fri, May 15, 2020 at 5:43 AM vsyrgkanis notifications@github.com wrote:
Thanks @emaadmanzoor https://github.com/emaadmanzoor and @realkenlee https://github.com/realkenlee for the discussion!
- The easiest way to handle binary outcome if you want to use a classifer for E[Y|X], would be to build a regression wrapper and pass it as model_y: i.e. some example code
from econml.dml import LinearDMLCateEstimatorfrom sklearn.linear_model import LogisticRegressionclass RegWrapper: def init(self, classifier): self.classifier = classifier def fit(self, X, y, kwargs): return self.classifier.fit(X, y, kwargs) def predict(self, X): return self.classifier.predict_proba(X)[:, 1] est = LinearDMLCateEstimator(model_y=RegWrapper(LogisticRegression()), model_t=LogisticRegression(), discrete_treatment=True)
n = 1000X = np.random.uniform(-1, 1, size=(n, 1))D = np.random.binomial(1, .5 + .1X[:, 0], size=(n,))Y = np.random.binomial(1, .5 + .2D + .1*X[:, 0], size=(n,))est.fit(Y, D, W=X, inference='statsmodels')est.summary()
[image: image] https://user-images.githubusercontent.com/13246065/82051094-458f6c00-9687-11ea-8ceb-866904b832d4.png
If you want to get more fancy and expose all the attributes of the classifier as attributes of the wrapper, you can use this version we implemented in our dmliv prototype:
https://github.com/microsoft/EconML/blob/3606b0bcc7779b78e6df8991dbcd7b72ac3046ef/prototypes/dml_iv/utilities.py#L117 With this version you can access the coef_ of the underlying classifier as:
est = RegWrapper(LogisticRegression()).fit(X, y)est.coef_
while with the simpler version above you would need to do:
est.classifier.coef_
- It's not clear when the above approach would be theoretically sound (though I suspect it might be ok in practice). Observe that the theory requires that the model is linear in D, i.e.:
Y = D*theta(X) + g(X) + epsilon D = p(X) + eta
So
E[Y | X] = p(X) * theta(X) + g(X)
However, logistic regression assumes that:
E[Y | X] = logistic( <q, X>)
It's not clear that there are natural primitive assumptions on p, theta and g, so that you can write the former as the latter. For instance, if theta(X)=theta is a constant, then the above assumes that: p(X)+g(X) = logistic(<q, X>), whic means that somehow the propensity p(X) is related with the confounding effect g(X) via the latter relationship. So from a theoretical perspective it might just be better to just use a linear probability model for Y (i.e. just use a lasso for model_y even if Y is binary).
For a binary outcome, it is more reasonable to assume the model:
E[Y | D, X] = logistic( D* theta(X) + g(X) ) D = p(X) + eta
But this is a non-linear equation for Y and requires different theoretical arguments for orthogonalization. For instance check out Section 2.1, p. 16 of: https://arxiv.org/pdf/1806.04823.pdf where we derive orthogonal moments for such a non-linear logistic Y model. This is not implemented in our library.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/microsoft/EconML/issues/204#issuecomment-629214516, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6JOMCCSWRJQ2FSP5PPASLRRU2IPANCNFSM4J5ZNZ5Q .
--
Ken Lee
@realkenlee The orthogonal moment as described in the paper I noted, has a different structure and does not boil down to a residual on residual regression. So indeed there is no analogue to the residual of Y as you note.
Since that method is not yet implemented and has not been stress tested I'm not sure it's the best option but you can try. You can check the code from the github of that paper. In particular this part of the code implements the orthogonal estimation method for the logistic model that you described: https://github.com/vsyrgkanis/plugin_regularized_estimation/blob/1bcbad4803b2b7834477ab39051d40f7758c408b/logistic_te.py#L104
@vsyrgkanis
Sorry that I seem to re-open this closed issue... 2 questions on this topic:
However, you mentioned "Since that method is not yet implemented and has not been stress tested I'm not sure it's the best option but you can try."
Is this script still in beta stage? i.e. we are not suggested to use? Or we can tweak that binary outcome Double ML code you shared now? E.g. Would it be a good option for me to follow your repository for your logistic + double ML paper? https://github.com/vsyrgkanis/plugin_regularized_estimation/tree/1bcbad4803b2b7834477ab39051d40f7758c408b
In your repo's README (), I looked at the section "ORTHOPY library", where it looks promising to use class LogisticWithOffsetAndGradientCorrection(), that "is an estimator adhering to the fit and predict specification of sklearn that enables fitting an "orthogonal" logistic regression"? Can I then specify it in Orthogonal/Double ML method like the following?
est = DMLCateEstimator(model_y=LogisticWithOffsetAndGradientCorrection(),
model_t=sklearn_classifier(),
model_final=sklearn_linear_regression())
If above looks reasonable, then how exactly would the last-stage "residual of Y ~ residual of treatment" look like? It would look like a logistic regression as well for the last-stage residual regression?
I found this Doubly Robust estimator class tutorial slides, to specify doubly robust estimator for logistic regression: https://www4.stat.ncsu.edu/~davidian/double.pdf. However, beyond this slide, most other papers I found about doubly robust, are specified for continuous outcome variable Y, and in Microsoft EconML package user guide for Doubly Robust Learner, can't find anywhere to specify binary outcome variable when using doubly robust method: https://econml.azurewebsites.net/spec/estimation/dr.html. So, would appreciate it if I can learn more about that.
(I know we can use the RegWrapper util function as you mentioned above, but just want to know if doing this would also be theoretically solid for the causal interpretation using Doubly Robust learner).
Thanks!
Hi! Follow up on this - is there a latest Double ML estimator or flag that I should use in the latest 0.9.0b1 beta release (release notes)?
Or should I keep using the RegWrapper class provided by @vsyrgkanis in #204?
from econml.dml import LinearDMLCateEstimator
from sklearn.linear_model import LogisticRegression
class RegWrapper:
def __init__(self, classifier):
self.classifier = classifier
def fit(self, X, y, **kwargs):
return self.classifier.fit(X, y, **kwargs)
def predict(self, X):
return self.classifier.predict_proba(X)[:, 1]
est = LinearDMLCateEstimator(model_y=RegWrapper(LogisticRegression()),
model_t=LogisticRegression(),
discrete_treatment=True)
@raylinz That is not something that we have addressed in this release, so that's probably still your best bet.
Sorry for commenting again in this closed issue, but my understanding is that DML refers to continuous outcome and not in a binary one (due to the structural equation formulation). Am I misunderstanding something here? Logistic regression models a bernulli variable, while the structural equation models a variable with a distribution depending on epsilon
Sorry for commenting again in this closed issue, but my understanding is that DML refers to continuous outcome and not in a binary one (due to the structural equation formulation). Am I misunderstanding something here? Logistic regression models a bernulli variable, while the structural equation models a variable with a distribution depending on epsilon
I am a student of causal inference, so still gaining momentum here. Can somebody point me in a direction of a modeling approach where binary outcome is appropriate?
I'd also appreciate some insight here. It seems like there are theoretical (not practical) difficulties with binary outcomes in the DML and DR frameworks, but it seems like meta learners would be okay.
Hi, thanks for this useful package!
I am modeling a scenario with a binary outcome using logistic regression for
model_y
. I get strange coefficients due to how the final model regression is set up inDMLCateEstimator
.Say
\hat{Y}
and\hat{T}
are generated frommodel_y
andmodel_t
respectively.DMLCateEstimator
runs a linear regression ofY - \hat{Y}
onT - \hat{T}
and reports the coefficient onT - \hat{T}
When the outcome is modeled via logistic regression, wouldn't running logistic regression of
Y
on\hat{Y} + (T - \hat{T})
and reporting the coefficient ofT - \hat{T}
be more appropriate? Changing the final model to logistic regression does not fix things: it does not even run, sinceY - \hat{Y}
is not binary.Below are some experiments that replicate this issue.
I generated synthetic data with a binary outcome, continuous treatment x1 and one covariate x2. The following is the statsmodels output of logistic regression on my synthetic data. The true coefficient of interest on x1 = 0.05:
I used the following code to fit a DML estimator on the same data. Note that I'm not using any regularization, so the resulting coefficient should be similar to those on x1 above (it is not):
The following code replicates the
DMLCateEstimator
naively and obtains a coefficient similar to the one above:The following code modifies the final regression and obtains more sensible estimates: