mckinsey / causalnex

A Python library that helps data scientists to infer causation rather than observing correlation.
http://causalnex.readthedocs.io/
Other
2.2k stars 256 forks source link

Is causalnex really causal? #6

Closed zemlyansky closed 4 years ago

zemlyansky commented 4 years ago

In the lib repo you write:

A Python library that helps data scientists to infer causation rather than observing correlation

In docs:

In this package we are mostly interested in the case where BNs are causal. Hence, the edge between nodes should be seen as cause -> effect relationship.

Also:

Bayesian Network consists of a DAG, a causal graph where nodes represents random variables

A bayesian network can equivalently encode dependency between variables as a -> b -> c and a <- b <- c. https://en.wikipedia.org/wiki/Bayesian_network#Causal_networks. What helps causalnex find causal relationships and does it really do that? Thanks

benhorsburgh commented 4 years ago

Hi @zemlyansky thank you for your question.

In short: causalnex does not find causal relationships. causalnex allows data scientists to collaborate with domain experts, to define a causal structure, and then draw causal insights from it.


You are correct that Bayesian Networks can equivalently encode dependencies as you describe. However, one of these equivalencies may be causal, and others will not. For example, if we replace a, b, c with age, height, and reach, then we can say that:

However, we cannot equivalently say:

So, to your question "What helps causalnex find causal relationships and does it really do that?". No, Causalnex does not find causal relationships. However, causalnex does "help data scientists to infer causation", "in the case where BNs are causal".

How does it do this? By encouraging the workflow of StructureModel -> BayesianNetwork -> InferenceEngine.

  1. Use Machine-Learning to create a seed StructureModel, which is not causal.
  2. Review the StructureModel with domain experts, who can amend the structure to reflect known causal relationships (cross referencing literature, expertise, etc).
  3. Iterate until a known causal structure is defined.
  4. Create a BayesianNetwork from the causal StructureModel, and use the InferenceEngine to create insights.

The important step here is the collaboration between data scientists and domain experts. Ideally, a domain expert could look at our variables, and tell us the structure. In practice this is too time consuming for such a valuable resource, and so structure learning algorithms can help make this iterative process significantly faster.

benhorsburgh commented 4 years ago

It is our intention to use GitHub Issues to track development related issues for causalnex. As such, I will close this Issue. However, please do get in contact if you would like to discuss this in more detail.

Ben