pymc-labs / CausalPy

A Python package for causal inference in quasi-experimental settings
https://causalpy.readthedocs.io
Apache License 2.0
919 stars 65 forks source link

Bayesian Power Analysis #276

Open cetagostini opened 1 year ago

cetagostini commented 1 year ago

Hello causalpy Community,

I would like to propose a new feature that can significantly improve the utility of causalpy for experimenters. It is about adding a method for Power Analysis within the Bayesian framework.

Why this addition?

Let me explain the rationale behind this proposed addition. When performing quasi-experiments, it is often hard to determine whether the model can detect the effects we expect to observe. Therefore, we need to add a layer of complexity to the experiment to ensure that it is well-calibrated.

Proposed Workflow

Here's how it works. Say you are planning an experiment for December using causalpy. Before the experiment, you need to decide on the factors (regressors) to feed into the model to generate the counterfactual. Different combinations of factors can yield models with varying sensitivities to the changes we expect to observe. Sensitivity implies the model's fit to the anticipated outcomes. You might encounter models with identical R² values but differing sensitivities.

So, how do we select the best-suited model? The one that offers the highest sensitivity to our specific changes? This is where analyzing the power of our experiment becomes crucial for model selection.

To illustrate, let's assume your intervention is scheduled for December 10. In the preceding week, you would use causalpy to create a model based on interrupted time series methodologies. This model would then make predictions for a period before the launch of your experiment (say, the last week of November). If your model is well-calibrated, with accurately estimated factors, the mean of your predictions should align closely with the actual outcomes.

By predicting over a period where no change is expected, we can leverage the posterior probability to estimate the magnitude of effect necessary for it to be considered significant. In other words, we are determining the absolute change needed for the target value to fall outside the posterior.

This approach gives users flexibility in deciding their tolerance for higher false positive rates, which might be acceptable in some contexts. Most importantly, the proposed addition empowers users to make informed decisions about proceeding with an experiment. They can assess if an experiment is worth conducting based on the number of results needed to capture a perceptible effect. This could lead to the realization that some experiments may not be viable due to the inherent uncertainty in the model. If the impact needed is too high, do we really believe we can achieve it?

Now, how can we add another layer of support in making model selection less subjective, particularly considering the inherent uncertainty in each model?

Estimating the Probability of a New Value in Our Posterior Distribution (Pseudo P-Value)

Think of our posterior distribution as a range of possible values that we might see, with the mean value representing the most probable outcome. In this way, we can evaluate the probability of a new value being part of this distribution by measuring how far it deviates from the mean value.

If a value is precisely at the mean, it has a probability of 1 to fall in our posterior. As the value moves away from the average towards both extremes of the distribution, the probability decreases and approaches zero. This process allows us to determine how 'typical' or 'atypical' a new observation is, based on our model estimated posterior. It is an effective tool for interpreting and comprehending the model's predictions, particularly when dealing with data that display significant variance.

How it works?

Given a distribution D which is assumed to be the posterior, and a value x for which we want to determine the probability of occurrence within the distribution defined by D, the function can be described using the following steps:

  1. Define the Parameters:

    • Let mu and sigma be the mean and standard deviation of D, respectively.
    • Let Phi(x; mu, sigma) represent the cumulative distribution function (CDF) of the normal distribution for a given x, mean mu, and standard deviation sigma.
  2. Calculate the Bounds:

    • Let L = min(D) and U = max(D) be the minimum and maximum values of D, respectively.
    • Calculate the CDF values at these bounds: Phi(L; mu, sigma) and 1 - Phi(U; mu, sigma).
  3. Probability Calculation:

    • Calculate Phi(x; mu, sigma), the CDF value for x.
    • If Phi(x; mu, sigma) <= 0.5, the probability P is calculated as: P = 2 * ((Phi(x; mu, sigma) - Phi(L; mu, sigma)) / (1 - Phi(L; mu, sigma) - (1 - Phi(U; mu, sigma))))
    • Otherwise, if Phi(x; mu, sigma) > 0.5, calculate P as: P = 2 * ((1 - Phi(x; mu, sigma) + Phi(L; mu, sigma)) / (1 - Phi(L; mu, sigma) - (1 - Phi(U; mu, sigma))))
  4. Output:

    • The output is the probability P, rounded to two decimal places.

Explanation:

import numpy as np
from scipy.stats import norm
def probability_of_X_in_distribution(data, x):
    """
    Calculate the probability of a given value being in a distribution defined by the given data,
    without explicitly forcing the return to zero when outside the bounds.
    Args:
    - data: a list or array-like object containing the data to define the distribution
    - x: a numeric value for which to calculate the probability of being in the distribution
    Returns:
    - prob: a numeric value representing the probability of x being in the distribution
    """
    lower_bound, upper_bound = min(data), max(data)
    mean, std = np.mean(data), np.std(data)
    cdf_lower = norm.cdf(lower_bound, mean, std)
    cdf_upper = 1 - norm.cdf(upper_bound, mean, std)
    cdf_x = norm.cdf(x, mean, std)
    if cdf_x <= 0.5:
        probability = 2 * (cdf_x - cdf_lower) / (1 - cdf_lower - cdf_upper)
    else:
        probability = 2 * (1 - cdf_x + cdf_lower) / (1 - cdf_lower - cdf_upper)
    return round(probability, 2)

1443277e-8476-405a-92d6-09df95f47da9

bbd113ae-6041-4a2d-9ed0-9f71f3af9650

This function is similar to how Google's CausalImpact estimates the "posterior tail area probability".

Final thoughts

By incorporating this method, we can improve the decision-making process in model selection and guide users towards:

  1. Better understanding of the uncertainty, deciding if running experiments makes sense.
  2. Compare models or methods and select the ones with more power around them, not necessarily R2.
  3. Add a less subjective way to manage uncertainty.

I am looking forward to the community's feedback and insights on this!

drbenvincent commented 1 year ago

Thanks for this @cetagostini, very thorough indeed. I've not got experience running this exact thing, so I don't have any immediate suggestions for improvements.

But I (and I think many others) would be very happy to see a PR on this. If it included a concise by well explained example notebook for the docs then that would be a fantastic addition!

drbenvincent commented 5 months ago

Did you want to keep this issue, or close it @cetagostini ?