py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.01k stars 923 forks source link

Facing Errors in [dowhy_causal_discovery_example.ipynb] #491

Closed elakhatibi closed 1 year ago

elakhatibi commented 2 years ago

Hello,

I have faced two issues [one from removing 'SID' package from R and the other incompatible format of the graph] when I run this example: https://github.com/py-why/dowhy/blob/master/docs/source/example_notebooks/dowhy_causal_discovery_example.ipynb

I would greatly appreciate it if you would please help me.

The errors are as follows:

[First Error] Methods: LiNGAM and PC SHD_CPDAG = 25.000000 SHD = 24.000000 adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.

ImportError Traceback (most recent call last) Input In [49], in <cell line: 8>() 13 print("SHD_CPDAG = %f"%(SHD_CPDAG(graph1, graph2))) 14 print("SHD = %f"%(SHD(graph1, graph2, double_for_anticausal=False))) ---> 15 print("SID_CPDAG = [%f, %f]"%(SID_CPDAG(graph1, graph2))) 16 SID 17 print("SID = %f"%(SID(graph1, graph2)))

File ~/opt/miniconda3/lib/python3.9/site-packages/cdt/metrics.py:331, in SID_CPDAG(target, pred) 306 """Compute the Strutural Intervention Distance. The target graph 307 can be a CPDAG. A lower and upper bounds will be returned, they 308 correspond respectively to the best and worst DAG in the equivalence class (...) 328 Jonas Peters, Peter Bühlmann: https://arxiv.org/abs/1306.1043 329 """ 330 if not RPackages.SID: --> 331 raise ImportError("SID R package is not available. Please check your installation.") 333 true_labels = retrieve_adjacency_matrix(target) 334 predictions = retrieve_adjacency_matrix(pred, target.nodes() 335 if isinstance(target, nx.DiGraph) else None)

ImportError: SID R package is not available. Please check your installation. [Code Cell] import itertools from numpy.random import randint from cdt.metrics import SHD, SHD_CPDAG, SID, SID_CPDAG

Find combinations of pair of methods to compare

combinations = list(itertools.combinations(graphs_nx, 2))

for pair in combinations: print("***") graph1 = graphs_nx[pair[0]] graph2 = graphs_nx[pair[1]] print("Methods: %s and %s"%(pair[0], pair[1])) print("SHD_CPDAG = %f"%(SHD_CPDAG(graph1, graph2))) print("SHD = %f"%(SHD(graph1, graph2, double_for_anticausal=False))) print("SID_CPDAG = [%f, %f]"%(SID_CPDAG(graph1, graph2))) print("SID = %f"%(SID(graph1, graph2)))

[Second Error]

Causal Discovery Method : LiNGAM

ValueError Traceback (most recent call last) Input In [44], in <cell line: 1>() 8 graph_dot = str_to_dot(graph.source) 10 # Define Causal Model ---> 11 model=CausalModel( 12 data = data_sachs, 13 treatment='PIP2', 14 outcome='PKC', 15 graph=graph_dot) 17 # Identification 18 identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)

File ~/opt/miniconda3/lib/python3.9/site-packages/dowhy/causal_model.py:109, in CausalModel.init(self, data, treatment, outcome, graph, common_causes, instruments, effect_modifiers, estimand_type, proceed_when_unidentifiable, missing_nodes_as_confounders, identify_vars, **kwargs) 102 self._graph = CausalGraph( 103 self._treatment, 104 self._outcome, 105 effect_modifier_names = self._effect_modifiers, 106 observed_node_names=self._data.columns.tolist() 107 ) 108 else: --> 109 self.init_graph(graph=graph, identify_vars=identify_vars) 111 self._other_variables = kwargs 112 self.summary()

File ~/opt/miniconda3/lib/python3.9/site-packages/dowhy/causal_model.py:120, in CausalModel.init_graph(self, graph, identify_vars) 115 ''' 116 Initialize self._graph using graph provided by the user. 117 118 ''' 119 # Create causal graph object --> 120 self._graph = CausalGraph( 121 self._treatment, 122 self._outcome, 123 graph, 124 effect_modifier_names=self._effect_modifiers, 125 observed_node_names=self._data.columns.tolist(), 126 missing_nodes_as_confounders = self._missing_nodes_as_confounders 127 ) 129 if identify_vars: 130 self._common_causes = self._graph.get_common_causes(self._treatment, self._outcome)

File ~/opt/miniconda3/lib/python3.9/site-packages/dowhy/causal_graph.py:88, in CausalGraph.init(self, treatment_name, outcome_name, graph, common_cause_names, instrument_names, effect_modifier_names, mediator_names, observed_node_names, missing_nodes_as_confounders) 86 self.logger.error("Error: Please provide graph (as string or text file) in dot or gml format.") 87 self.logger.error("Error: Incorrect graph format") ---> 88 raise ValueError 89 if missing_nodes_as_confounders: 90 self._graph = self.add_missing_nodes_as_common_causes(observed_node_names)

ValueError:

[Code Cell]

for method, graph in graphs.items(): if method != "LiNGAM": continue print('\n*****\n') print("Causal Discovery Method : %s"%(method))

    # Obtain valid dot format
    graph_dot = str_to_dot(graph.source)

    # Define Causal Model
    model=CausalModel(
            data = data_mpg,
            treatment='mpg',
            outcome='weight',
            graph=graph_dot)

    # Identification
    identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
    print(identified_estimand)

    # Estimation
    estimate = model.estimate_effect(identified_estimand,
                                    method_name="backdoor.linear_regression",
                                    control_value=0,
                                    treatment_value=1,
                                    confidence_intervals=True,
                                    test_significance=True)
    print("Causal Estimate is " + str(estimate.value))

Looking forward to hearing from you.

Thanks a lot for your precious time and your kind consideration.

Best Regards, Elahe

amit-sharma commented 2 years ago

For the first error, you will need to install the SID package in R. you can open a R terminal/Rstudio and then do install.packages("SID") and then restart the notebook.

Let me know if that fixes it.

elakhatibi commented 2 years ago

Hi Amit,

Thanks a lot for your response.

I have just tried this command, and it works:

"remotes::install_github("https://github.com/cran/SID")"

I think SID is removed from CRAN--since "install.packages("SID")" does not work.

I would greatly appreciate it if you would please guide me regarding the 2nd Error as well: _"ValueError Traceback (most recent call last) Input In [44], in <cell line: 1>() 8 graph_dot = str_to_dot(graph.source) 10 # Define Causal Model ---> 11 model=CausalModel( 12 data = data_sachs, 13 treatment='PIP2', 14 outcome='PKC', 15 graph=graphdot)"

Thanks a lot

pvizan-artefact commented 2 years ago

Hi. I have been struggling with the same bug. I suspect the error comes from the str_to_dot function. I can run the whole notebook if I change the function definition to the following:

def str_to_dot(string):
    '''
    Converts input string from graphviz library to valid DOT graph format.
    '''
    graph = string.replace('\n', ';').replace('\t','').replace("{;", "{").replace("};", "}")
    return graph

All the outputs I get match the ones in the docs if I use the modified function. Let me know if it fixes it for you.

elakhatibi commented 2 years ago

Hi,

Thanks so much. Yes, it works for me as well.

Thanks a lot for your help.

Best, Ela

petergtz commented 1 year ago

Closing as issue was resolved.