Open cetagostini opened 11 months ago
Thanks for this @cetagostini, very thorough indeed. I've not got experience running this exact thing, so I don't have any immediate suggestions for improvements.
But I (and I think many others) would be very happy to see a PR on this. If it included a concise by well explained example notebook for the docs then that would be a fantastic addition!
Did you want to keep this issue, or close it @cetagostini ?
Hello
causalpy
Community,I would like to propose a new feature that can significantly improve the utility of
causalpy
for experimenters. It is about adding a method for Power Analysis within the Bayesian framework.Why this addition?
Let me explain the rationale behind this proposed addition. When performing quasi-experiments, it is often hard to determine whether the model can detect the effects we expect to observe. Therefore, we need to add a layer of complexity to the experiment to ensure that it is well-calibrated.
Proposed Workflow
Here's how it works. Say you are planning an experiment for December using
causalpy
. Before the experiment, you need to decide on the factors (regressors) to feed into the model to generate the counterfactual. Different combinations of factors can yield models with varying sensitivities to the changes we expect to observe. Sensitivity implies the model's fit to the anticipated outcomes. You might encounter models with identical R² values but differing sensitivities.So, how do we select the best-suited model? The one that offers the highest sensitivity to our specific changes? This is where analyzing the power of our experiment becomes crucial for model selection.
To illustrate, let's assume your intervention is scheduled for December 10. In the preceding week, you would use
causalpy
to create a model based on interrupted time series methodologies. This model would then make predictions for a period before the launch of your experiment (say, the last week of November). If your model is well-calibrated, with accurately estimated factors, the mean of your predictions should align closely with the actual outcomes.By predicting over a period where no change is expected, we can leverage the posterior probability to estimate the magnitude of effect necessary for it to be considered significant. In other words, we are determining the absolute change needed for the target value to fall outside the posterior.
This approach gives users flexibility in deciding their tolerance for higher false positive rates, which might be acceptable in some contexts. Most importantly, the proposed addition empowers users to make informed decisions about proceeding with an experiment. They can assess if an experiment is worth conducting based on the number of results needed to capture a perceptible effect. This could lead to the realization that some experiments may not be viable due to the inherent uncertainty in the model. If the impact needed is too high, do we really believe we can achieve it?
Now, how can we add another layer of support in making model selection less subjective, particularly considering the inherent uncertainty in each model?
Estimating the Probability of a New Value in Our Posterior Distribution (Pseudo P-Value)
Think of our posterior distribution as a range of possible values that we might see, with the mean value representing the most probable outcome. In this way, we can evaluate the probability of a new value being part of this distribution by measuring how far it deviates from the mean value.
If a value is precisely at the mean, it has a probability of 1 to fall in our posterior. As the value moves away from the average towards both extremes of the distribution, the probability decreases and approaches zero. This process allows us to determine how 'typical' or 'atypical' a new observation is, based on our model estimated posterior. It is an effective tool for interpreting and comprehending the model's predictions, particularly when dealing with data that display significant variance.
How it works?
Given a distribution
D
which is assumed to be the posterior, and a valuex
for which we want to determine the probability of occurrence within the distribution defined byD
, the function can be described using the following steps:Define the Parameters:
mu
andsigma
be the mean and standard deviation ofD
, respectively.Phi(x; mu, sigma)
represent the cumulative distribution function (CDF) of the normal distribution for a givenx
, meanmu
, and standard deviationsigma
.Calculate the Bounds:
L = min(D)
andU = max(D)
be the minimum and maximum values ofD
, respectively.Phi(L; mu, sigma)
and1 - Phi(U; mu, sigma)
.Probability Calculation:
Phi(x; mu, sigma)
, the CDF value forx
.Phi(x; mu, sigma) <= 0.5
, the probabilityP
is calculated as:P = 2 * ((Phi(x; mu, sigma) - Phi(L; mu, sigma)) / (1 - Phi(L; mu, sigma) - (1 - Phi(U; mu, sigma))))
Phi(x; mu, sigma) > 0.5
, calculateP
as:P = 2 * ((1 - Phi(x; mu, sigma) + Phi(L; mu, sigma)) / (1 - Phi(L; mu, sigma) - (1 - Phi(U; mu, sigma))))
Output:
P
, rounded to two decimal places.Explanation:
x
being part of the distribution defined by the datasetD
.Phi(x; mu, sigma)
gives the probability that a normally distributed random variable is less than or equal tox
.P
is adjusted to account for the truncated range of the dataset (fromL
toU
). It essentially normalizes the probability by considering only the portion of the normal distribution curve that lies within the range ofD
.x
falls outside the range ofD
, the CDF values naturally lead to a probability close to zero.This function is similar to how Google's CausalImpact estimates the "posterior tail area probability".
Final thoughts
By incorporating this method, we can improve the decision-making process in model selection and guide users towards:
PS: Usually, we use normal distributions on the likelihoods of our models based on what I check. Still, this can be adapted to other distribution types.
PS: Happy to open PR if needed.
I am looking forward to the community's feedback and insights on this!