Proper representation of estimated model in causal graph

carterrees commented 1 year ago

I am working with the customer segmentation example and wanted to make sure I understand how the model would be represented in a DAG.

Specifically, the wording found here says "We assume we have data that are generated from some collection policy. In particular, we assume that we have data of the form: {Y_i(T_i), T_i, X_i, W_i, Z_i} where Y_i(T_i) is the observed outcome for the chosen treatment, T_i is the treatment, X_i are the co-variates used for heterogeneity, W_i are other observable co-variates that we believe are affecting the potential outcome Y_i(T_i) and potentially also the treatment T_i..."

The DAG shows that effect of price (T) on demand (Y), control variables (W) are adjusted for assuming that they block all back-door paths between Y and T and X.

Therefore, am I correct when I run the code from the notebook that is is properly represented by the DAG in the pic?

est = LinearDML(
    model_y=GradientBoostingRegressor(),
    model_t=GradientBoostingRegressor(),
    featurizer=PolynomialFeatures(degree=2, include_bias=False),
)
est.fit(log_Y, log_T, X=X, W=W, inference="statsmodels")

Screen Shot 2023-03-09 at 4 43 02 PM

kbattocchi commented 1 year ago

Yes, that looks correct to me (although it's also possible that there are additional arrows among the Ws and Xs - we are agnostic to that). The key to correctness is that there are no unobserved variables (that is, variables that don't belong to W or X) that affect Y.

For most of our estimators (including DML), there are some additional assumptions on the functional form of the relationships between Y, T, and X, such as that Y is linear in (a featurizaton of) T, with a coefficient that depends only on X.

carterrees-entrata commented 1 year ago

Thank you @kbattocchi. Understood about some of the assumptions as well. I'll have some more questions that I'll lay out in another thread related to this answer and Shapley values.

py-why / EconML

Proper representation of estimated model in causal graph #744