Clarification on implications of refutation tests

That's a great question. In general, these tests simply detect that there is an error, but do not provide an obvious way of correcting it.

They do provide some hints though. In particular, there are two kinds of refutations: step-level refutations, and end-to-end refutations. Step-level refutation tells you that something is wrong in a particular step (Model, Identify, Estimate) whereas end-to-end refutations tell you that there is an error but nothing more. They are analogous to unit tests and integration tests in software engineering.

Bootstrap refuter is an example of step-level test. It diagnoses a problem with the Estimation step. [=> the most common fix is to choose a different estimation strategy. Although sometimes the right answer can also be to change the identification strategy and thereby change the estimator]
Subset refuter also checks estimation.
Random common cause and AddUnobservedCommoncause test identification (whether you missed an unobserved confounder). While these are step-level tests, the distinction is not foolproof. E.g., as mentioned above, to fix a failed bootstrap refutation, the right answer may be change the identification.

Placebo treatment refuter and dummy outcome refuter are examples of end-to-end refutations. Here we have no idea why the test failed. But these tests, being end-to-end, can capture errors that step-level tests may miss.

Hope this was helpful.

py-why / dowhy

Clarification on implications of refutation tests #285