py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.13k stars 935 forks source link

Refutation test: example of random common cause _failing_ #1248

Closed kiwidamien closed 1 month ago

kiwidamien commented 2 months ago

I understand the principle of a random common cause as described in this CausalWizard doc with the example of height and lung cancer.

Here we have a variable (height) that we don't think should be part of the DAG, but we can test it's inclusion as a method of validating the DAG.

In contrast, when

it seems that all that is being done is adding a random generated variable w_random, and declaring it a backdoor (thus controlling for it when using a backdoor estimator).

From studying causal inference, I believe that I can say that this does nothing to refute the original DAG. It does provide a check on the validity and the sensitivity of the estimator. This is very different, the first link from the CausalWizard docs on smoking uses RCC as a DAG validation method (which it can, because in this case the feature could have been a common cause, and we want to see what leaving it out would do). I don't think you can refute the DAG itself by controlling for uncorrelated noise.

If this is true, I think the documentation would benefit greatly from separating out:

as my guess is that most people are using a package with estimators that are already validated (rather than implementing their own), and are looking at a pre-packaged way of refuting their modeling assumptions.

If I am incorrect, and the DAG can be refuted from a Random Common Cause, the documentation would greatly benefit from an example where on data-generating process is used, an incorrect DAG/model is constructed, and the RCC test is able to refute it. (I would actually claim that a failing test would benefit each of the refutation methods).

I am happy to help with the work / construction of examples, and it was trying to generate an example and thinking through it that lead me to thinking the Random Common Cause, as implemented in DoWhy, is not a DAG refutation tool

amit-sharma commented 2 months ago

This is a great point, @kiwidamien . The current implementation of random_common_cause only checks for the validity of an estimator. To validate the DAG, we would need to add a candidate variable rather than a completely simulated variable.

I like the idea of separating the docs for 1) validation of estimators; 2) validation of DAGs. Will you be willing to help out with that?

kiwidamien commented 2 months ago

I'd be happy to help with separating the docs out into validation of estimators and validation of DAGs!

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 14 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 7 days since being marked as stale.