y0-causal-inference / y0

❓y0 (pronounced "why not?") is for causal inference in Python
https://y0.readthedocs.io
BSD 3-Clause "New" or "Revised" License
44 stars 9 forks source link

Graph structure repair based on data #162

Open cthoyt opened 1 year ago

cthoyt commented 1 year ago

This takes a prior knowledge on the network in the form of DAG or ADMG, as well as the available data (can be observational and/or interventional data), and repairs the network structure based on given data. The goal is to make sure that the conditional independence implied by the data are aligned with the conditional independence implied by the network. Here are the steps:

1) Use a conditional independence test to find all the tests that failed. 2) For each failed test between two variables such as V_i and V_j, add a bi-directed edge between them.

Now we have a repaired network with additional bi-directed edges. If the prior knowledge graph was an ADMG, we now have a new ADMG with additional bi-directed edges. If the prior knowledge graph was a DAG, it is now converted to an ADMG.

srtaheri commented 1 year ago

Here is an example. Assume that this is the prior knowledge network in the form of ADMG:

Screen Shot 2023-08-23 at 10 34 21 AM Now assume that the conditional independency test based on the data between $Z_1$, $Z_2$ given $M_1$ failed. In addition the conditional independency between $R_2$, and $R_3$ given some variables failed. Furthermore, the conditional independency between $Y$, and $R_3$ given some other variables failed. Hence we will put a bi-directed edge between ($Z_1$, $Z_2$), ($R_2$, $R_3$), and ($Y$, $R_3$) as follows:

Screen Shot 2023-08-23 at 10 35 46 AM

The above graph is the repaired ADMG.

cthoyt commented 1 year ago

It's not clear to me why you would make a test between $Z_1$, $Z_2$ given $M_1$ then infer that there should be a bidirected edge between ($Z_1$, $Z_2$). What is special about $M_1$ in this case? Can you please make this a bit more of an algorthmic description?

Here are the all of the conditional independencies for this graph calculated with y0.algorithm.conditional_independencies.get_conditional_independencies():

left right conditions
M1 R1 M2
M1 R2 M2
M1 R3 M2
M1 Y M2, X
M1 Z1 X
M1 Z2 X
M1 Z3 X
M2 R2 R1
M2 R3 R2
M2 X M1
M2 Z1 M1
M2 Z2 M1
M2 Z3 M1
R1 R3 R2
R1 X M2
R1 Y M2, R3
R1 Z1 M2
R1 Z2 M2
R1 Z3 M2
R2 X M2
R2 Y M2, R3
R2 Z1 M2
R2 Z2 M2
R2 Z3 M2
R3 X M2
R3 Z1 M2
R3 Z2 M2
R3 Z3 M2
X Y M2, Z3
X Z2 Z1
X Z3 Z2
Y Z1 M2, Z3
Y Z2 M2, Z3
Z1 Z3 Z2
Click here to get the code ```python from y0.graph import NxMixedGraph from y0.dsl import Z1, Z2, Z3, Variable, X, Y from y0.algorithm.conditional_independencies import get_conditional_independencies import pandas as pd from tabulate import tabulate R1, R2, R3 = (Variable(f"R{i + 1}") for i in range(3)) M1, M2 = (Variable(f"M{i + 1}") for i in range(2)) def main(): graph = NxMixedGraph.from_edges( directed=[ (Z1, X), (Z1, Z2), (Z2, Z3), (Z3, Y), (X, M1), (M1, M2), (M2, R1), (M2, Y), (R1, R2), (R2, R3), (R3, Y), ], undirected=[(X, Z1)], ) cis = get_conditional_independencies(graph) df = pd.DataFrame( sorted( ( conditional_independency.left, conditional_independency.right, ", ".join(sorted(v.name for v in conditional_independency.conditions)), ) for conditional_independency in cis if conditional_independency.separated ), columns=["left", "right", "conditions"], ) print(tabulate(df, headers=df.columns, tablefmt="github", showindex=False)) if __name__ == "__main__": main() ```
cthoyt commented 1 year ago

So it can be given no variables or any combination of variables

cthoyt commented 1 year ago

Turns out @srtaheri is looking for https://github.com/y0-causal-inference/y0/blob/main/src/y0/algorithm/falsification.py