mckinsey / causalnex

A Python library that helps data scientists to infer causation rather than observing correlation.
http://causalnex.readthedocs.io/
Other
2.21k stars 256 forks source link

Kernel Crash and errors running Tutorial with custom data #175

Open instabaines opened 1 year ago

instabaines commented 1 year ago

Description

I am following the 'A first CausalNex tutorial' notebook using a custom dataset JupyterLab. I encountered different issues. I was able to solve some of them.

Expected Result

Expected a similar output to the tutorial

Actual Result

Canceled future for execute_request message before replies were done The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click here for more info. View Jupyter log for further details. `

Your Environment

ElisabethSesterHussQB commented 1 year ago

Hi, thanks for reaching out! We took a look at your issue by running the code you provided. One thing we noticed is that the data you are using might not be best suited when it comes to structure learning as it is very random and therefore the edges all have very small weights. When removing edges using remove_edges_below_threshold we needed to use a threshold around ~0.05 to get rid of enough edges and achieve acyclicity. One step that should be done is also calling get_largest_subgraph as causalnex does not support separated components for now. The expected flow in causalnex would then be the following:

sm = from_pandas()
sm.remove... # remove edges that are wrong by manually removing them or applying a threshold
sm.get_largest_subgraph
# discretise data
discretised_data = discretise data
bn = BayesianNetwork(sm)
bn.fit_node_states(discretised_data)
bn.fit_cpds(discretised_data)

Unfortunately, we weren’t able to recreate your second error. We are happy to take a closer look if you are still facing the same issues. Also any additional code you can share with us would be helpful in finding out the exact cause of the error.