py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.01k stars 922 forks source link

Can refute_graph() check independences that are not conditional? #950

Closed leechelseahaosin closed 1 year ago

leechelseahaosin commented 1 year ago

Seems like refute_graph() can only detect conditional independences, but can it also check independencies without variables to condition on? If not, is there a method call that can automatically check these independent relationships accurately and without having to check them for manually?

I've tried to test refute_graph() with a sample DAG that should check for one implied independence relationship. I've drawn two variables, node a and node b, that act as confounders between node T and node O. These confounders are also independent of each other. The same implied independent relationship can be found through daggity by changing digraph to dag in causal_graph variable

import pandas as pd
import dowhy 
import random

z=[i for i in range(50,60)]
random.shuffle(z)
df = pd.DataFrame(data={'a':z, 
                        'b':z,
                        'T':  [random.randint(0,1) for i in range(10)], 
                        'O':  [random.randint(0,1) for i in range(10)]

})

causal_graph = """digraph {
a
b
T [exposure]
O [outcome]
T -> O
a -> O
a -> T
b -> O
b -> T
}

model= dowhy.CausalModel(
        data = df,
        graph=causal_graph,
        treatment="T",
        outcome='O')

refuter_object = model.refute_graph(k=1, independence_test = {'test_for_continuous': 'partial_correlation',
'test_for_discrete' : 'conditional_mutual_information'})
print(refuter_object)`

The output from above shows that the refutegraph() does not test any relationships, which is not correct as it should detect the independent relationship between node a and node b. (I would like to also plug another issue where this only works if the variables are single letters, otherwise it results in the error in issue #949)_

Method name for discrete data:conditional_mutual_information
Method name for continuous data:partial_correlation
Number of conditional independencies entailed by model:0
Number of independences satisfied by data:0
Test passed:True

However, when I explicitly check for this relationship, it does test the one relationship but it should not pass the test since I've made both node a and node b perfectly correlated.

refuter_object = model.refute_graph(k=4, independence_constraints=[('a','b', ())], 
                   independence_test = {'test_for_continuous': 'partial_correlation', 'test_for_discrete' : 'conditional_mutual_information'})
print(refuter_object)
Method name for discrete data:conditional_mutual_information
Method name for continuous data:partial_correlation
Number of conditional independencies entailed by model:1
Number of independences satisfied by data:1
Test passed:True

Version information:

bloebp commented 1 year ago

Hey, you might want to check out: https://www.pywhy.org/dowhy/main/example_notebooks/gcm_falsify_dag.html It was just recently added, i.e., you would need to install the mainline version. But it will be part of the next release.

arainboldt commented 1 year ago

thanks @bloebp this looks very interesting

leechelseahaosin commented 1 year ago

@bloebp looks promising. Can this feature be used without using gcm()? what about with econML estimators?

bloebp commented 1 year ago

It is based on performing multiple independence tests, i.e., you don't really need an estimator, just the graph structure and data. You would need the gcm module, but only the part for this algorithm (so, no need to define it as an SCM etc.).

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 14 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 7 days since being marked as stale.