cetagostini commented 3 months ago

Hey team, here is the new PR (Follow up of #292 ) to create a experiments power estimation based a decided ROPE from our posterior distribution!

Context

Let's assume your intervention is scheduled for December 10. In the preceding week, you would use CausalPy to create a causal model based on interrupted time series methodologies. This model would then make predictions for a period before the launch of your experiment (say, the last week of November). If your model is well-calibrated, with accurately estimated factors, the mean of your predictions should align closely with the actual outcomes (The difference between reality and the posterior should be a distribution with mean zero and certain sigma).

By making predictions over a period where no change is anticipated, we can use the posterior to estimate our potential mean or cumulative values on a regular basis. We can then establish a threshold area or region of practical equivalence (ROPE) to gauge the level of effect required for it to be deemed significant. In essence, we are determining the precise change necessary for the target value to deviate from the posterior. Applying this procedure, the MDE will be a value outside of the given ROPE which can be specified by our alpha.

This estimation allows for an assessment of the model's sensitivity to changes and the experiment's feasibility.

Pre-Experimentation setup

By applying this method before the experiment period we will be able to determine what is the setup of our most optimal model to reduce our MDE and increase the power. Using this method we can answer questions like:

Should we generate more samples?
Should we increase the number of chains?
What are the best set of regressors?
What is the best number of observations to train?

Assessment

We could end up with several results outside the ROPE area. How to determine which is more extreme than the other?

Think of our posterior distribution as a range of possible values that we might see, with the mean value representing the most probable outcome. In this way, we can evaluate the probability of a new value being part of this distribution by measuring how far it deviates from the mean value, and goes far from the respective ROPE.

If a value is precisely at the mean, it has a probability of 1 to fall within our posterior. As the value moves away from the average towards both extremes of the distribution, the probability decreases and approaches zero. This process allows us to determine how 'typical' or 'atypical' a new observation is, based on our model estimated posterior.

In simple terms, we are seeing the true effect size falls within the estimated posterior credible interval or not.

Few examples

1443277e-8476-405a-92d6-09df95f47da9

bbd113ae-6041-4a2d-9ed0-9f71f3af9650

This function is similar to how Google's CausalImpact estimates the "posterior tail area probability".

📚 Documentation preview 📚: https://causalpy--368.org.readthedocs.build/en/368/

cetagostini commented 3 months ago

pre-commit.ci autofix

cetagostini commented 3 months ago

@NathanielF @drbenvincent

[x] New class for the ROPE and power estimation.
[x] Test suite for the ROPE estimation.

Next Steps:

[ ] Introduction to the method and the possible applications.

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 95.58011% with 8 lines in your changes missing coverage. Please review.

Project coverage is 86.54%. Comparing base (67181c6) to head (c3b108d).

Files	Patch %	Lines
causalpy/pymc_rope.py	92.92%	8 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #368 +/- ## ========================================== + Coverage 85.60% 86.54% +0.93% ========================================== Files 22 24 +2 Lines 1716 1895 +179 ========================================== + Hits 1469 1640 +171 - Misses 247 255 +8 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

drbenvincent commented 3 months ago

Cool. I think the ROPE and MDE framing makes much more sense. I'm going to play devil's advocate here in order to really distill this down into its raw essence 💎

Why is this needed?

Let's say Bob already uses CausalPy Bayesian synthetic control methods. They run a synthetic control model after the experiment is done and they get a credible interval on the causal impact and they can use this to make a judgement about whether the intervention had a meaningful effect. Screenshot 2024-06-24 at 14 15 30 What would you say to Bob about why that approach is insufficient and why he should care about the ROPE?

At the moment, what you have under the Pre-Experimentation setup section of the issue doesn't really seem to match up with high level goals and doesn't do this work justice. I'd be tempted to delete that and focus on the 'why' and 'what you get'. That would help focus the docs/examples on the core points and also make people much more excited.

What do we get (if anything) beyond a validation period approach

In the (currently WIP PR https://github.com/pymc-labs/CausalPy/pull/367) we can do synthetic control after the intervention has happened, but do parameter estimation on a smaller training period before a validation period. For example: Screenshot 2024-06-24 at 14 33 20

What you could do, for example, is build a ROPE from the intervention period on the causal impact space (not the raw outcome space) and use that to define when your observed causal impacts are 'meaningful'. Something like this: Screenshot 2024-06-24 at 14 36 21

Let's set aside the goals we have for this in terms of multi-unit synthetic control, and focus just on traditional vanilla synthetic control. Can we really distill the core essence of why ROPE and MDE are important and what the proposed method can do better than what we do already, or what we could do when the intervention period PR is merged? Is there anything really crucial about running the analysis before we have the post intervention data?

Hope this helps rather than frustrates!

pymc-labs / CausalPy

Creating Power Analysis Through Posterior ROPE Estimation #368

Context

Pre-Experimentation setup

Assessment

Few examples

Codecov Report

Why is this needed?

What do we get (if anything) beyond a validation period approach