py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.03k stars 923 forks source link

Interpretation of ATE #1252

Open abhilasha-workday opened 2 weeks ago

abhilasha-workday commented 2 weeks ago

Hello @amit-sharma ,

I am currently using DoWhy library for some causal analysis where my treatment is continuous and my outcome is binary. For estimating effect I used tow methods: Logistic regression through GLM and DML. Please find code snippets for their implementation:

Logistic Regression import statsmodels.api as sm estimate = model.estimate_effect(est_ident, method_name="backdoor.generalized_linear_model", test_significance=True, method_params = { 'num_null_simulations':20, 'num_simulations':20, 'num_quantiles_to_discretize_cont_cols':10, 'fit_method': "statsmodels", 'glm_family': sm.families.Binomial(), # logistic regression 'need_conditional_estimates':False }, control_value= 0.2, treatment_value= 0.3 ) print(estimate)

DML from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor from sklearn.linear_model import LassoCV from sklearn.preprocessing import PolynomialFeatures

dml_estimate = model.estimate_effect(est_ident, method_name="backdoor.econml.dml.DML", control_value = 0.1, treatment_value = 0.2, confidence_intervals=False, method_params={"init_params":{'model_y':GradientBoostingClassifier(random_state = 101), 'model_t': GradientBoostingRegressor(random_state = 101), "model_final":LassoCV(random_state = 101), 'featurizer':PolynomialFeatures(degree=1, include_bias=True), 'discrete_treatment': False, 'random_state': 101 }, "fit_params":{}}) print(dml_estimate)

  1. Could you elaborate on how to interpret ATE for both of these methods? For ex: If my ATE is -0.02, is it okay to say that 'Increasing treatment from 0.1 to 0.2, leads to 3% decrease in the outcome'?
  2. Is the ATE returned by Logistic regression model actually the coefficient of the treatment variable of the model? Also, when I compare the coefficient returned by estimate.estimator.model for GLM estimator with the mean estimate returned by estimate_effect(), they are drastically different. Is that expected behavior?
  3. I wanted to create a sort of a dose-response curve to see the effect of change of treatment on my outcome. For ex- how my outcome changes when I increase by treatment from 0.1-0.2, 02-0.3, 0.3-0.4 etc. In order to accomplish this, I changed the values for my control_value and treatment_value parameters in estimate_effect(). When I do so, I get different ATE with logistic regression but same ATE with DML. Why is that?
  4. How does DoWhy work in the back to calculate ATE when the control_value and the treatment_value is provided? Specially interested in the GLM and DML effect estimation methods.

Looking forward to hearing back on this. TIA!

abhilasha-workday commented 5 days ago

Can somebody please help me with this?