py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
6.88k stars 916 forks source link

NetworkXError: graph should be directed acyclic #1184

Closed benTC74 closed 1 week ago

benTC74 commented 1 month ago

Hi All,

I am very new in the causal discovery area, and I have encountered some issues while building a causal graph to be used in DoWhy. Any help would be super helpful to me! Btw great package!

Just a side questions first:

  1. Is it possible to identify estimand (with identify_effect) and do refutation without building a graph? Sometimes it just takes too much time to build a graph when there a lot of different variables to test.

The main question is the following:

  1. Before fitting the causal graph to DoWhy, I have used gCastle with PC algorithm building a causal graph, in particular a nx graph is returned. In the graph, I have also specified a lot of priori edges, especially there should be one variable that does not affect any other variables and has incoming edges from almost all the other variables, and some variables that receives no edges from any variables. The resulted graph is shown below (not very well drafted as I am not sure how to adjust the space). Node 0 is the node that receives edges from almost all the other nodes while Node 1-11 should have no incoming edges (shown in the red circles).

gCastleDAG

I have inserted the above graph to DoWhy through the code below and a new graph is generated. I would expect DoWhy to show the same graph with all the same edges (which is shown below). However, it doesn't seem to be like that. For Node 0, it does not receive as many edges as previously, and Node 3 also receives edges from Node 2 which should not be the case (shown in red circles).

Code:

model = CausalModel(
   data=df_graph, # some pandas dataframe
   treatment='1',
   outcome='0',
   graph="\n".join(nx.generate_gml(learned_graph))
)

plt.figure(figsize=(100,100))
model.view_model()

image

In addition, when I run DoWhy for identified_estimand and the backdoor path with identify_effect (code shown below), it shows such error "NetworkXError: graph should be directed acyclic" (detailed error log shown below). I thought this error should not occur as I do have nodes (at least originally in gCastle) that do not have incoming edges and at least one node that has no descendant. Does anyone know how to solve this? Really appreciate of the help here! Thank you!

identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)

Error: NetworkXError Traceback (most recent call last) Cell In[466], line 1 ----> 1 identified_estimand = model.identify_effect(proceed_when_unidentifiable=True) 2 print(identified_estimand)

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\causal_model.py:239, in CausalModel.identify_effect(self, estimand_type, method_name, proceed_when_unidentifiable, optimize_backdoor) 232 else: 233 identifier = AutoIdentifier( 234 estimand_type=estimand_type, 235 backdoor_adjustment=BackdoorAdjustment(method_name), 236 optimize_backdoor=optimize_backdoor, 237 ) --> 239 identified_estimand = identifier.identify_effect( 240 graph=self._graph._graph, 241 action_nodes=self._treatment, 242 outcome_nodes=self._outcome, 243 observed_nodes=list(self._graph.get_all_nodes(include_unobserved=False)), 244 ) 246 self.identifier = identifier 248 return identified_estimand

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\causal_identifier\auto_identifier.py:101, in AutoIdentifier.identify_effect(self, graph, action_nodes, outcome_nodes, observed_nodes, conditional_node_names) 93 def identify_effect( 94 self, 95 graph: nx.DiGraph, (...) 99 conditional_node_names: List[str] = None, 100 ): --> 101 estimand = identify_effect_auto( 102 graph, 103 action_nodes, 104 outcome_nodes, 105 observed_nodes, 106 self.estimand_type, 107 conditional_node_names, 108 self.backdoor_adjustment, 109 self.optimize_backdoor, 110 self.costs, 111 ) 113 estimand.identifier = self 115 return estimand

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\causal_identifier\auto_identifier.py:178, in identify_effect_auto(graph, action_nodes, outcome_nodes, observed_nodes, estimand_type, conditional_node_names, backdoor_adjustment, optimize_backdoor, costs) 171 return IdentifiedEstimand( 172 None, 173 treatment_variable=action_nodes, 174 outcome_variable=outcome_nodes, 175 no_directed_path=True, 176 ) 177 if estimand_type == EstimandType.NONPARAMETRIC_ATE: --> 178 return identify_ate_effect( 179 graph, 180 action_nodes, 181 outcome_nodes, 182 observed_nodes, 183 backdoor_adjustment, 184 optimize_backdoor, 185 estimand_type, 186 costs, 187 conditional_node_names, 188 ) 189 elif estimand_type == EstimandType.NONPARAMETRIC_NDE: 190 return identify_nde_effect( 191 graph, action_nodes, outcome_nodes, observed_nodes, backdoor_adjustment, estimand_type 192 )

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\causal_identifier\auto_identifier.py:231, in identify_ate_effect(graph, action_nodes, outcome_nodes, observed_nodes, backdoor_adjustment, optimize_backdoor, estimand_type, costs, conditional_node_names) 228 if backdoor_adjustment not in EFFICIENT_METHODS: 229 # First, checking if there are any valid backdoor adjustment sets 230 if optimize_backdoor == False: --> 231 backdoor_sets = identify_backdoor(graph, action_nodes, outcome_nodes, observed_nodes, backdoor_adjustment) 232 else: 233 from dowhy.causal_identifier.backdoor import Backdoor

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\causal_identifier\auto_identifier.py:514, in identify_backdoor(graph, action_nodes, outcome_nodes, observed_nodes, backdoor_adjustment, include_unobserved, dseparation_algo, direct_effect) 512 # First, checking if empty set is a valid backdoor set 513 empty_set = set() --> 514 check = check_valid_backdoor_set( 515 graph, 516 action_nodes, 517 outcome_nodes, 518 empty_set, 519 backdoor_paths=backdoor_paths, 520 new_graph=bdoor_graph, 521 dseparation_algo=dseparation_algo, 522 ) 523 if check["is_dseparated"]: 524 backdoor_sets.append({"backdoor_set": empty_set})

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\graph.py:98, in check_valid_backdoor_set(graph, nodes1, nodes2, nodes3, backdoor_paths, new_graph, dseparation_algo) 94 if new_graph is None: 95 # Assume that nodes1 is the treatment 96 new_graph = do_surgery(graph, nodes1, remove_outgoing_edges=True) ---> 98 dseparated = nx.algorithms.d_separated(new_graph, set(nodes1), set(nodes2), set(nodes3)) 99 elif dseparation_algo == "naive": 100 # ignores new_graph parameter, always uses self._graph 101 if backdoor_paths is None:

File <class 'networkx.utils.decorators.argmap'> compilation 4:4, in argmap_d_separated_1(G, x, y, z) 2 import collections 3 import gzip ----> 4 import inspect 5 import itertools 6 import re

File ~\AppData\Local\anaconda3\Lib\site-packages\networkx\algorithms\d_separation.py:176, in d_separated(G, x, y, z) 131 """ 132 Return whether node sets x and y are d-separated by z. 133 (...) 172 https://en.wikipedia.org/wiki/Bayesian_network#d-separation 173 """ 175 if not nx.is_directed_acyclic_graph(G): --> 176 raise nx.NetworkXError("graph should be directed acyclic") 178 union_xyz = x.union(y).union(z) 180 if any(n not in G.nodes for n in union_xyz):

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 14 days with no activity.

amit-sharma commented 1 month ago

@benTC74 thanks for raising this issue. Can you share the text representation of your graph (learned_graph), the one that you input to DoWhy? I can try to reproduce the error.

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 14 days with no activity.

github-actions[bot] commented 1 week ago

This issue was closed because it has been inactive for 7 days since being marked as stale.