Closed kiwidamien closed 1 month ago
This is a great point, @kiwidamien . The current implementation of random_common_cause only checks for the validity of an estimator. To validate the DAG, we would need to add a candidate variable rather than a completely simulated variable.
I like the idea of separating the docs for 1) validation of estimators; 2) validation of DAGs. Will you be willing to help out with that?
I'd be happy to help with separating the docs out into validation of estimators and validation of DAGs!
This issue is stale because it has been open for 14 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.
I understand the principle of a random common cause as described in this CausalWizard doc with the example of height and lung cancer.
Here we have a variable (height) that we don't think should be part of the DAG, but we can test it's inclusion as a method of validating the DAG.
In contrast, when
it seems that all that is being done is adding a random generated variable
w_random
, and declaring it a backdoor (thus controlling for it when using a backdoor estimator).From studying causal inference, I believe that I can say that this does nothing to refute the original DAG. It does provide a check on the validity and the sensitivity of the estimator. This is very different, the first link from the CausalWizard docs on smoking uses RCC as a DAG validation method (which it can, because in this case the feature could have been a common cause, and we want to see what leaving it out would do). I don't think you can refute the DAG itself by controlling for uncorrelated noise.
If this is true, I think the documentation would benefit greatly from separating out:
as my guess is that most people are using a package with estimators that are already validated (rather than implementing their own), and are looking at a pre-packaged way of refuting their modeling assumptions.
If I am incorrect, and the DAG can be refuted from a Random Common Cause, the documentation would greatly benefit from an example where on data-generating process is used, an incorrect DAG/model is constructed, and the RCC test is able to refute it. (I would actually claim that a failing test would benefit each of the refutation methods).
I am happy to help with the work / construction of examples, and it was trying to generate an example and thinking through it that lead me to thinking the Random Common Cause, as implemented in DoWhy, is not a DAG refutation tool