py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.87k stars 720 forks source link

Preprocess compositional data #845

Closed Leo-T-Zang closed 5 months ago

Leo-T-Zang commented 10 months ago

Hi EconML Team,

I am working on causal inference over compositional data. I wonder if we need to furhter preprocess our data first (e.g., Centered log ratio transform) before we use any causal inference model (e.g. DML).

Thank you.

kbattocchi commented 9 months ago

The answer to this will depend somewhat on what specific models you use. If you use something like CausalForestDML (which will fit a non-parameterized final model) and you also use non-parametric models for your first stage model, then transforming X and W should generally not be necessary. However, the treatment effect model that we fit with any of our DML variants is always a linear effect of T on Y, so transforming Y and/or T will change the interpretation of the effect accordingly. (For example, in a pricing context you might take the log of Y and T, so that the computed effect is the price elasticity of demand, indicating that a one-percent change in price will have a corresponding percentage change in output, rather than that each price increase of a dollar will translate to a corresponding fixed decrease in demand).

Some of our other models, like LinearDML, have more structure (e.g. assuming that the effect \theta(X) is linear in X), in which case you would want to consider whether a transforming your data beforehand might make more sense.

Note that for convenience, most of our estimators allow you to pass a featurizer argument, which transforms X, and a treatment_featurizer argument, which transforms T, so if it's more convenient you can pass those transforms rather than explicitly transforming the data yourself.

Leo-T-Zang commented 9 months ago

Thank you @kbattocchi !

I am using your CausalAnalysis API with heterogeneity model as LinearDML. I don't think CausalAnalysis supports featurizer at this moment right?

kbattocchi commented 9 months ago

That's correct - CausalAnalysis is designed to have a simpler interface so it doesn't expose all of the options that using the DML subclasses directly would provide.