mckinsey / causalnex

A Python library that helps data scientists to infer causation rather than observing correlation.
http://causalnex.readthedocs.io/
Other
2.21k stars 256 forks source link

Unsuitability of Notears for causal inference #195

Closed arainboldt closed 1 year ago

arainboldt commented 1 year ago

Causalnex fills an important gap in the python ecosystem. Thank you all for your work on the package and for keeping it OS.

Description

Given this paper: https://arxiv.org/abs/2104.05441v2

And this paper: https://proceedings.neurips.cc/paper/2021/file/e987eff4a7c7b7e580d659feb6f60c1a-Paper.pdf

It seems odd that no other structure learning methods have been incorporated into the package.

Context

There are a lot of structure learning methods that are specifically designed with causal inference in mind. it's worth exploring how these methods can be implemented in the causalnex framework.

ElisabethSesterHussQB commented 1 year ago

Thank you for raising this issue! This is very good point and we have been working on addressing this as well. Identifying causality in real data is a complex problem and there are many approaches so tackle it. A fundamentally different approach from the one we have in causalnex and which we considered was the Kernel Conditional Deviance for Causal Inference (KCDC) approach. Here, the authors propose a fully nonparametric causal discovery method based on purely observational data by interpreting larger structural variabilities of conditional distributions as non-existence of causality. Choosing NOTEARS and with that the underlying constraint opimization approach was a design choice when building causalnex. This approach allows us to incorporate external knowledge into our model in form of additional constraints. However, we are aware of the limitations of NOTEARS as raised by the papers you posted. Therefore, we have implemented two new algorithms that also follow a constraint optimization approach to give the user more options. These will be part of the next release:

  1. DAGGNN: A Deep learning approach to train an autoencoder which is parametrised by a graph neural network - based on this paper
  2. NOFEARS: Extend NOTEARS by implementing a Karush-Kuhn-Tucker (KKT)-search algorithm on top of the current implementation as proposed in this paper

With this release, we hope to give our users more options and we will keep our eyes open for more ways to enhance causalnex in the future. Please do keep making suggestions that we can look into!

arainboldt commented 1 year ago

@ElisabethSesterHussQB thanks for the very thorough reply and the references! I've very curious to read through them. The repo is really useful and fun, and I look forward to seeing it grow. Happy to know that you'll be expanding available algos for structure learning. Cheers!

GabrielAzevedoFerreiraQB commented 1 year ago

Thanks a lot for support, @arainboldt !