py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.01k stars 922 forks source link

Natural Direct Effect estimation does not support multiple mediators #966

Open DarioSimonato opened 1 year ago

DarioSimonato commented 1 year ago

Hi everybody! I'm quite new to the use of this library and, by testing it on a toy example, I think I found an error in the identification module.

In particular, I'm following the tutorial of natural direct effect (https://www.pywhy.org/dowhy/v0.9.1/example_notebooks/dowhy_mediation_analysis.html#) in the easy graph attached to estimate the effect of Z on Y only though the direct path Z->Y. dag Therefore I use tha standard procedures with gcm (define the model, identification and estimation) but I see one problem with identification because I get returned E[d(Y|A1)/d(Z)] as the estimand, while I should have had E[d(Y|A1, A2)/d(Z)] because there are two paths to block to calculate that direct effect.

I've been told it's not a feature supported yet, so probably an issue should be raised to inform users about this.

Here's the code to reproduce it

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import dowhy
from dowhy import CausalModel
import dowhy.datasets, dowhy.plotter
import networkx as nx

# creating the toy dataset
n_samples = 10000
z  = np.random.normal(0, 1, n_samples)
a1 =  .5 * z + .2 * np.random.normal(0, 1, n_samples) +.3
a2 =  .2 * z + .3 * np.random.normal(0, 1, n_samples) -.2
y  = .7 * a1 + .6 * a2 -.4 * z + .2 * np.random.normal(0, 1, n_samples)

z = 1*(z>0)
a1 = 1*(a1>0)
a2 = 1*(a2>0)
y = 1*(y>0)

df = pd.DataFrame({'Z':z, 'A1':a1, 'A2':a2, 'Y':y})
# creating the gcm
s = "graph[directed 1"
for node in causal_graph.nodes:
    s += "node[ id \"" + node + "\" label \"" + node + "\"]"
for edge in causal_graph.edges:
    s += "edge[ source \"" + edge[0] + "\" target \"" + edge[1] + "\"]"
s += "]"
s
model = CausalModel(df,"Z","Y",s,
                   missing_nodes_as_confounders=False)

model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))

# finding the estimand
# Natural direct effect (nde)
identified_estimand_nde = model.identify_effect(estimand_type="nonparametric-nde", proceed_when_unidentifiable=False, optimize_backdoor = True)
print(identified_estimand_nde)

Output

Estimand type: EstimandType.NONPARAMETRIC_NDE

### Estimand : 1
Estimand name: mediation
Estimand expression:
 ⎡ d        ⎤
E⎢────(Y|A1)⎥
 ⎣d[Z]      ⎦
Estimand assumption 1, Mediation: A1 intercepts (blocks) all directed paths from Z to Y except the path {Z}→{Y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{Z} and U→{A1} then P(A1|Z,U) = P(A1|Z)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{A1} and U→Y then P(Y|A1, Z, U) = P(Y|A1, Z)
amit-sharma commented 1 year ago

Thanks for raising this @DarioSimonato There is an issue with supporting multi-variable mediators. Will look into this and raise an Error if needed.

rudi-mac commented 6 months ago

I am running into the same issue. Would very much appreciate an update. Thank you in advance!

rudi-mac commented 1 month ago

Any update on this?