py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.04k stars 924 forks source link

Clarify the differences among refute methods #1168

Closed xwbxxx closed 3 months ago

xwbxxx commented 5 months ago

Ask your question In the refute module, DoWhy provides two pair of methods that look like similar. I'm REALLLLY confused about them.

The first pair of methods is: model.refute_estimate(method_name="random_common_cause") vs model.refute_estimate(method_name="add_unobserved_common_cause") as they both utilize the common cause.

The second pair of methods is: model.refute_estimate(method_name="data_subset_refuter") vs model.refute_estimate(method_name="bootstrap_refuter") as they both execute the resampling operation.

Although the documents and notebooks show how to call these refuters and the documents and paper compare these methods shortly, it's really hard for beginner to understand their differences. For example, the notebook Iterating over multiple refutation tests only provides output of refuters and other Sensitivity Analysis examples mainly focus on model.refute_estimate(method_name="add_unobserved_common_cause")

I'll be appreciated if you could update the documents and clarify their differeces. It will help a lot.

amit-sharma commented 5 months ago

Thanks for raising this. We will update the documentation in the next few weeks. Meanwhile, here's the answer.

for more info, you can refer to https://arxiv.org/abs/2011.04216

xwbxxx commented 5 months ago

Thank you for your reply! That helps a lot!

random common cause: adds a randomly generated common cause. Estimated effect should not change.

That means _random_common_causerefutor only change the graph structure by adding a new nodes as a confounder, and the value of treatment and outcome remain unchanged, right?

So how about _add_unobserved_commoncause? Do you change the sample of treatment & outcome by adding (coef common_cause) to orignal treatment & outcome (e.g. Treatment'/Outcome' = Treatment/Outcome + αcommon_cause)?

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 14 days with no activity.

amit-sharma commented 4 months ago

That means random_common_cause_refutor only change the graph structure by adding a new nodes as a confounder, and the value of treatment and outcome remain unchanged, right?

Yes.

So how about add_unobserved_common_cause? Do you change the sample of treatment & outcome by adding (coef common_cause) to orignal treatment & outcome (e.g. Treatment'/Outcome' = Treatment/Outcome + αcommon_cause)?

Even here, theoretically, only the graph is changed and we add an unobserved confounder that has a correlation with treatment and outcome. To implement such a change, the default method actually modifies the treatment and outcome as you say above. There are other methods (sensitivity analysis methods) under this same function that don't change treatment/outcome and instead follow a different approach.

The key difference between two refutation methods is that in 1) random: the confounder is not correlated with either outcome or treatment; 2) add_unobserved: the missing confounder is assumed to be causing both treatment and outcome.

xwbxxx commented 4 months ago

Thanks a lot, now I thoroughly understand their similarities and differences.

However, when I try to interpret the results, once again, I'm confused. Here are my results of bootstrap and subset refutor:
image Considering the new effect, they are both close to the estimated effect and proves the robustness of the estimator (at least to some extent). But their p values are quite different.

So how should I interpret the result based on the p values?

amit-sharma commented 4 months ago

Yeah, the new effect is almost the same in both cases, so the estimator is okay. Not sure why you are getting a p-value of 0 for the bootstrap refuter. That would usually indicate that the estimator failed the test. Can you share some code to reproduce the issue?

xwbxxx commented 4 months ago

Here is my code. My dowhy version is 0.11. The p-value of 0 happened when I try to refute backdoor.linear_regression estimator.

from dowhy import CausalModel
import dowhy.datasets
import pandas as pd
import numpy as np

# Config dict to set the logging level
import logging.config

DEFAULT_LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'loggers': {
        '': {
            'level': 'WARN',
        },
    }
}

logging.config.dictConfig(DEFAULT_LOGGING)

# Value of the coefficient [BETA]
BETA = 10
# Number of Common Causes
NUM_COMMON_CAUSES = 2
# Number of Instruments
NUM_INSTRUMENTS = 1
# Number of Samples
NUM_SAMPLES = 200000
# Treatment is Binary
TREATMENT_IS_BINARY = False
data = dowhy.datasets.linear_dataset(beta=BETA,
                                     num_common_causes=NUM_COMMON_CAUSES,
                                     num_instruments=NUM_INSTRUMENTS,
                                     num_samples=NUM_SAMPLES,
                                     treatment_is_binary=TREATMENT_IS_BINARY)

model = CausalModel(
    data=data['df'],
    treatment=data['treatment_name'],
    outcome=data['outcome_name'],
    graph=data['gml_graph'],
    instruments=data['instrument_names']
)

model.view_model()
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)
print("----------------------------------------------------")

def backdoor_linear():
    causal_estimate_bd = model.estimate_effect(identified_estimand,
                                               method_name="backdoor.linear_regression",
                                               target_units="ate")

    print("Causal effect of backdoor: ", causal_estimate_bd.value)
    print("-----------------")

    # random_common_cause ================================================================
    random_common_cause = model.refute_estimate(identified_estimand, causal_estimate_bd,
                                                method_name="random_common_cause")
    print(random_common_cause)
    print("-----------------")

    # placebo_treatment ================================================================
    placebo_treatment = model.refute_estimate(identified_estimand, causal_estimate_bd,
                                              method_name="placebo_treatment_refuter")
    print(placebo_treatment)
    print("-----------------")

    # dummy_outcome ================================================================
    dummy_outcome = model.refute_estimate(identified_estimand, causal_estimate_bd,
                                          method_name="dummy_outcome_refuter")
    print(dummy_outcome[0])
    print("-----------------")

    # data_subset ================================================================
    res_subset = model.refute_estimate(identified_estimand, causal_estimate_bd,
                                       method_name="data_subset_refuter",
                                       subset_fraction=0.8)
    print(res_subset)
    print("-----------------")

    # bootstrap ================================================================
    bootstrap = model.refute_estimate(identified_estimand, causal_estimate_bd,
                                      method_name="bootstrap_refuter")
    print(bootstrap)
    print("----------------------------------------------------")

def instrumental_variable():
    causal_estimate_iv = model.estimate_effect(identified_estimand, method_name="iv.instrumental_variable", )
    print("Causal effect of instrument variable: ", causal_estimate_iv.value)

    # placebo_treatment ================================================================
    placebo_treatment = model.refute_estimate(identified_estimand, causal_estimate_iv,
                                              placebo_type="permute",
                                              method_name="placebo_treatment_refuter")
    print(placebo_treatment)
    print("-----------------")

    # causal_estimate_iv_2 = model.estimate_effect(identified_estimand,
    #                                              method_name="iv.instrumental_variable",
    #                                              method_params={'iv_instrument_name': 'Z0'})
    # placebo_treatment_2 = model.refute_estimate(identified_estimand, causal_estimate_iv_2,
    #                                             placebo_type="permute",
    #                                             method_name="placebo_treatment_refuter")
    # print(placebo_treatment_2)
    # print("-----------------")
    # random_common_cause ================================================================
    random_common_cause = model.refute_estimate(identified_estimand, causal_estimate_iv,
                                                method_name="random_common_cause")
    print(random_common_cause)
    print("-----------------")

    # dummy_outcome ================================================================
    dummy_outcome = model.refute_estimate(identified_estimand, causal_estimate_iv,
                                          method_name="dummy_outcome_refuter")
    print(dummy_outcome[0])
    print("-----------------")

    # data_subset ================================================================
    res_subset = model.refute_estimate(identified_estimand, causal_estimate_iv,
                                       method_name="data_subset_refuter",
                                       subset_fraction=0.8)
    print(res_subset)
    print("-----------------")

    # bootstrap ==================================================================
    bootstrap = model.refute_estimate(identified_estimand, causal_estimate_iv,
                                      method_name="bootstrap_refuter")
    print(bootstrap)

backdoor_linear()
instrumental_variable()
xwbxxx commented 4 months ago

I think I found the reason for p value=0 in the case of two similar results of new effect and expected effect.

def perform_bootstrap_test(estimate, simulations: List):
    # This calculates a two-sided percentile p-value
    # See footnotes in https://journals.sagepub.com/doi/full/10.1177/2515245920911881
    half_p_value = np.mean([(x > estimate.value) + 0.5 * (x == estimate.value) for x in simulations])
    return 2 * min(half_p_value, 1 - half_p_value)

The p value is derived from the function above, indicating the probability that estimate (expected effect) falls in the distribution of simulations (new effects from refutor). If the distribution of simulations is narrow, or even worse, takes the same value, the estimate won't fall in the distribution, and the p value gets 0.

drawlinson commented 4 months ago

@xwbxxx I've just written a very detailed guide to the refuter methods here https://causalwizard.app/inference/article/bootstrap-refuters-dowhy which might be helpful for you.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 14 days with no activity.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.