py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
6.99k stars 923 forks source link

False Backdoor estimand #972

Closed asha24choudhary closed 1 year ago

asha24choudhary commented 1 year ago

I have a dataset with the following graph

image

Clearly, there is no backdoor path. But when try to identify the causal effect using 'model.identify_effect(proceed_when_unidentifiable=True)'.

I get the following output:

image

I am wondering why is it showing the estimand type to be backdoor when there is no confounder & thereby no backdoor path.

Am I missing anything? Please help me!

Sincerely,

Asha

amit-sharma commented 1 year ago

There are multiple ways to extract a backdoor set. you can use method="minimal" to obtain what you are expecting.

  1. minimal: the smallest set W that is a valid adjustment set
  2. maximal: the largest set W that is a valid adjustment set (note here X2 is a valid backdoor since the backdoor equation is still satisfied with W=X2)
  3. default: a mix of 1 and 2 (the one you are using)

To see all possibilities of valid backdoor sets, you can set method="exhaustive". It is not efficient for large graphs but will work for your graph.

asha24choudhary commented 1 year ago

Could you please tell me where is this parameter 'method' which I can define. I am looking into this file after checking the estimate method here?

amit-sharma commented 1 year ago

It is in the causal_model class, here

asha24choudhary commented 1 year ago

Ok thank you for the help. I tried with 3 different methods viz., "minimal-adjustment", "maximal-adjustment" & "exhaustive-search". However, I'm getting the same output which I shared with you previously. This one

image
amit-sharma commented 1 year ago

Can you share a full reproducible code/notebook? What is your treatment and outcome?

asha24choudhary commented 1 year ago

Sure, here it is import numpy as np import pandas as pd import dowhy import dowhy.api from dowhy import CausalModel import dowhy.datasets import matplotlib.pyplot as plt

np.random.seed(101) N_SAMPLES = 10000

Data generation

Scenarios with 3 nodes

a) Correlation case

Create the graph describing the causal structure

graph = """graph[directed 1 node[id "X2" label "X2"] node[id "X1" label "X1"] node[id "Y" label "Y"] edge[source "X1" target "Y"] edge[source "X2" target "Y"]]""".replace('\n', '')

Generate the data

X1 = np.random.randn(N_SAMPLES) X2 = np.random.randn(N_SAMPLES) Y = 0.65X1 + 0.2X2

Data to df

df = pd.DataFrame(np.vstack([X1, X2, Y]).T, columns=['X1', 'X2', 'Y']) print(df.head(10))

Create a model

model = CausalModel( data=df, treatment=['X1','X2'], outcome=['Y'], graph=graph ) model.view_model() plt.show() display(Image(filename="causal_model.png")) plt.show()

Generate estimand

identified_estimand= model.identify_effect(method_name = "minimal-adjustment",proceed_when_unidentifiable=True) print(identified_estimand)

amit-sharma commented 1 year ago

Ah, I understand now. I was assuming that the treatment is X1, but you have assigned both X1 and X2 as treatment.

In that case, the identify_effect method is simply showing that the null set is a valid backdoor adjustment. And hence it indicates that fitting a model for $E[Y|X1, X2]$ will give the true causal effect. If you are interested in the effect of X1 (or X2), try providing only a single one as treatment.

asha24choudhary commented 1 year ago

Ok thank you, however if I set treatment to be just 'X1', still I get the bakdoor path from X1 to Y. Not able to understand how can this be a backdoor path?

image image
amit-sharma commented 1 year ago

It means that the empty set {} is a backdoor set for this treatment-outcome pair. The method is not finding the backdoor paths, it is finding the valid adjustment "backdoor" sets that block any backdoor path between treatment and outcome. In this case, the empty set is enough.

asha24choudhary commented 1 year ago

I am sorry. I do not understand what you replied before. How and from where do you comment about the empty set? Meaning where is the empty set in the output picture I posted before? Also, I feel in my graph there is no backdoor path, so there is no need of any adjustment.

Could you please elaborate a bit more?

amit-sharma commented 1 year ago

Sure, the usual backdoor output is E[Y||W] where W is the backdoor set. Since the numerator is simply E[Y] in the backdoor equation, it implies that W is the empty set.

d E[Y]/d X1 is the same as computing effect without any adjustment.

asha24choudhary commented 1 year ago

Okay now I get. Thank you for the clarification:)