Add propensity weighting schemes and covariate balance plot functionality

Files	Patch %	Lines
causalpy/pymc_experiments.py	95.31%	9 Missing :warning:
causalpy/data_validation.py	81.81%	2 Missing :warning:

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-04-08T16:53:44Z ----------------------------------------------------------------

Link to Hernan's book

NathanielF commented on 2024-04-14T19:12:04Z ----------------------------------------------------------------

Done

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-04-08T16:53:45Z ----------------------------------------------------------------

Typo: "That is to say, the condition of strong ignorability holds if the treatment status T is independent of the propensity p(X), conditional on ~~the~~ X"

NathanielF commented on 2024-04-14T19:12:13Z ----------------------------------------------------------------

Fixed

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-04-08T16:53:45Z ----------------------------------------------------------------

You're using the raw weighting scheme in the function call, not the robust one, contrary to what you're saying in the text. Is that a typo?

NathanielF commented on 2024-04-14T19:27:48Z ----------------------------------------------------------------

That was a typo.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-04-08T16:53:46Z ----------------------------------------------------------------

Add link to blog post again
Typo: "Aaverage treatment effect"

NathanielF commented on 2024-04-14T19:27:56Z ----------------------------------------------------------------

fixed

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-04-08T16:53:47Z ----------------------------------------------------------------

What's the y-axis on the first plot?

NathanielF commented on 2024-04-14T19:31:40Z ----------------------------------------------------------------

Count of observations. These are still histograms, just layered histograms for both the observed propensity scores and the reweighted propensity scores under different draws from the posterior of the propensity score distribution

NathanielF commented on 2024-04-14T19:31:52Z ----------------------------------------------------------------

Added ylabel to the plot

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-04-08T16:53:47Z ----------------------------------------------------------------

Update the colors in the text (they don't match the actual colors)
Format this paragraph better, with an actual numbered list. Gonna make reading easier

NathanielF commented on 2024-04-14T19:32:01Z ----------------------------------------------------------------

Split this out.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-04-08T16:53:48Z ----------------------------------------------------------------

Typo: "This is a useful reminder in that, while propensity score weighting methods are aids to inference in observational data, not all weighting schemes are created equal and we need to be careful in our assessment of when each is applied appropriately."
Do you have a resource on when to use which weighting scheme when? Or how to choose among them? Because here it's obvious because you have the true ATE. But what do you do with a real use-case?

NathanielF commented on 2024-04-14T19:32:23Z ----------------------------------------------------------------

Added some more explanation.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-04-08T16:53:49Z ----------------------------------------------------------------

robust is what you used to fit the model, right? Then how reliable is the doubly-robust estimation? Do you have to re-fit the model?

NathanielF commented on 2024-04-14T19:33:29Z ----------------------------------------------------------------

You don't have to re-fit the model. The weighting is a post-processing step so you can apply different weighting schemes after the model is fit using a kwarg. I've added a note to clarify this.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2024-04-08T16:53:50Z ----------------------------------------------------------------

"This is aspect of the doubly robust estimator is by design and can be important where precise estimation of the treatment effects are important."
Reference as to why this is the case? When would you ever use robust then?
Typo: "We have limited our focus ~~and~~ on the implementation"

NathanielF commented on 2024-04-14T20:02:07Z ----------------------------------------------------------------

I've clarified that the differences and how the methods need not align, but differences between them would be indicative of a miscalibrated propensity model.

NathanielF commented 2 months ago

Thanks Alex. Will try and get to it this weekend

drbenvincent commented 2 months ago

warning when locally building the docs...

Inverse Propensity Score Weighting
""""""""""""""""""""""""""""""""
/Users/benjamv/git/CausalPy/docs/source/index.rst:126: WARNING: Title underline too short.

drbenvincent commented 2 months ago

Would be good to update from main just to keep everything fresh

drbenvincent commented 2 months ago

doctests pass locally ✅

drbenvincent commented 2 months ago

Tests pass locally ✅

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-04-11T10:18:38Z ----------------------------------------------------------------

Could (very briefly) describe what the NHEFS data set is about OR just be vague about it and talk about a 'real world dataset' which you go on to describe in the 'NHEFS Data' section.

Would be good to either have a short explanation of what propensity scores are, or a link to a glossary item. Also the relationship between the propensity and why the "inverse" propensity. Just to help make this more accessible for readers unfamiliar

Some explanation about what a weighting scheme is would be good - It's great to have a reference out, but I think it would be stronger if the post was a bit more self-encapsulated / dydactic.

Would be good to have an itemised list with (at least) 1-sentence explainers about each of the reweighing schemes.

NathanielF commented on 2024-04-14T20:05:42Z ----------------------------------------------------------------

Updated with more clarity about the method and the intent.

In this notebook we will briefly demonstrate how to use propensity score weighting schemes to recover treatment effects in the analysis of observational data. We will first showcase the method with a simulated data example drawn from Lucy D’Agostino McGowan’s excellent blog on inverse propensity score weighting. Then we shall apply the same techniques to NHEFS data set discussed in Miguel Hernan and Robins’ Causal Inference: What if book. This data set measures the effect of quitting smoking between the period of 1971 and 1982. At each of these two points in time the participant’s weight was recorded, and we seek to estimate the effect of quitting in the intervening years on the weight recorded in 1982.

We will use inverse propensity score weighting techniques to estimate the average treatment effect. There are a range of weighting techniques available: we have implemented raw, robust, doubly robust and overlap weighting schemes all of which aim to estimate the average treatment effect. The idea of a propensity score (very broadly) is to derive a one-number summary of individual’s probability of adopting a particular treatment. This score is typically calculated by fitting a predictive logit model on all an individual’s observed attributes predicting whether or not the those attributes drive the individual towards the treatment status. In the case of the NHEFS data we want a model to measure the propensity for each individual to quit smoking.

The reason we want this propensity score is because with observed data we often have a kind of imbalance in our covariate profiles across treatment groups. Meaning our data might be unrepresentative in some crucial aspect. This prevents us cleanly reading off treatment effects by looking at simple group differences. These “imbalances” can be driven by selection effects into the treatment status so that if we want to estimate the average treatment effect in the population as a whole we need to be wary that our sample might not give us generalisable insight into the treatment differences. Using propensity scores as a measure of the prevalance to adopt the treatment status in the population, we can cleverly weight the observed data to privilege observations of “rare” occurence in each group. For example, if smoking is the treatment status and regular running is generally not common among the group of smokers, then on the occasion we see a smoker marathon runner we should heavily weight their outcome measure to overcome their low prevalence in the treated group but real presence in the unmeasured population. Inverse propensity weighting tries to define weighting schemes are inversely proportional to an individual’s propensity score so as to better recover an estimate which mitigates (somewhat) the risk of selection effect bias. For more details and illustration of these themes see the PyMC examples write up on Non-Parametric Bayesian methods [Forde, 2024]

_drbenvincent commented on 2024-05-02T09:14:12Z_ ----------------------------------------------------------------

nice!

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-04-11T10:18:38Z ----------------------------------------------------------------

Could it be a good idea to add a data visualisation in here, possibly sns.pairplot with hue=trt ? Not essential, but just a thought

Maybe declare TREATMENT_EFFECT = 2 and use that in the data generation code. Just to make it really obvious

NathanielF commented on 2024-04-14T20:06:06Z ----------------------------------------------------------------

Added in TREATMENT_EFFECT var

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-04-11T10:18:39Z ----------------------------------------------------------------

Needs a brief explanation. What's the y-axis?

NathanielF commented on 2024-04-14T20:06:19Z ----------------------------------------------------------------

Added some explanation.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-04-11T10:18:40Z ----------------------------------------------------------------

This is great. But I think this can be expanded upon to give a slightly more didactic explanation/introduction into the logic of inverse propensity approach, maybe with some links.

NathanielF commented on 2024-04-14T20:06:42Z ----------------------------------------------------------------

Again, have generally a more explanatory approach this time.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-04-11T10:18:40Z ----------------------------------------------------------------

Needs some explanation about what we are looking at here

NathanielF commented on 2024-04-14T20:07:03Z ----------------------------------------------------------------

Added a note that these are the propensities we will seek to use.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-04-11T10:18:41Z ----------------------------------------------------------------

Could be a bit more explicit about what the left and right panels are showing.

NathanielF commented on 2024-04-14T20:07:21Z ----------------------------------------------------------------

Added more explict flagging.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-04-11T10:18:42Z ----------------------------------------------------------------

Final sentence needs a full stop.

NathanielF commented on 2024-04-14T20:08:02Z ----------------------------------------------------------------

Added full stop.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-04-11T10:18:43Z ----------------------------------------------------------------

~~In addition to Alex's comments, I might suggest labelling alphabetically, and including those (e.g. (a), (b), (c) in the subfigure titles.~~ No, ignore this suggestion.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-04-11T10:18:44Z ----------------------------------------------------------------

Could potentially ad a markdown horizontal line "---" to emphasise that we are changing gears here. Otherwise maybe just add something like "Having warmed up with simulated data, let's look at some real data..."

NathanielF commented on 2024-04-14T20:08:22Z ----------------------------------------------------------------

Added markdown separator.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2024-04-11T10:18:44Z ----------------------------------------------------------------

For the link to your pymc-example... you could grab the bibtex and make that into a proper reference. There are some examples of doing that in some of the other example notebooks.

NathanielF commented on 2024-04-14T20:08:33Z ----------------------------------------------------------------

Added bibtex

NathanielF commented 2 months ago

Done

View entire conversation on ReviewNB

NathanielF commented 2 months ago

Fixed

View entire conversation on ReviewNB

NathanielF commented 2 months ago

That was a typo.

View entire conversation on ReviewNB

NathanielF commented 2 months ago

fixed

View entire conversation on ReviewNB

NathanielF commented 2 months ago

View entire conversation on ReviewNB

NathanielF commented 2 months ago

Added ylabel to the plot

View entire conversation on ReviewNB

NathanielF commented 2 months ago

Split this out.

View entire conversation on ReviewNB

NathanielF commented 2 months ago

Added some more explanation.

View entire conversation on ReviewNB

NathanielF commented 2 months ago

You don't have to re-fit the model. The weighting is a post-processing step so you can apply different weighting schemes after the model is fit using a kwarg. I've added a note to clarify this.

View entire conversation on ReviewNB

NathanielF commented 2 months ago

I've clarified that the differences and how the methods need not align, but differences between them would be indicative of a miscalibrated propensity model.

View entire conversation on ReviewNB

NathanielF commented 2 months ago

Updated with more clarity about the method and the intent.

In this notebook we will briefly demonstrate how to use propensity score weighting schemes to recover treatment effects in the analysis of observational data. We will first showcase the method with a simulated data example drawn from Lucy D’Agostino McGowan’s excellent blog on inverse propensity score weighting. Then we shall apply the same techniques to NHEFS data set discussed in Miguel Hernan and Robins’ Causal Inference: What if book. This data set measures the effect of quitting smoking between the period of 1971 and 1982. At each of these two points in time the participant’s weight was recorded, and we seek to estimate the effect of quitting in the intervening years on the weight recorded in 1982.

We will use inverse propensity score weighting techniques to estimate the average treatment effect. There are a range of weighting techniques available: we have implemented raw, robust, doubly robust and overlap weighting schemes all of which aim to estimate the average treatment effect. The idea of a propensity score (very broadly) is to derive a one-number summary of individual’s probability of adopting a particular treatment. This score is typically calculated by fitting a predictive logit model on all an individual’s observed attributes predicting whether or not the those attributes drive the individual towards the treatment status. In the case of the NHEFS data we want a model to measure the propensity for each individual to quit smoking.

The reason we want this propensity score is because with observed data we often have a kind of imbalance in our covariate profiles across treatment groups. Meaning our data might be unrepresentative in some crucial aspect. This prevents us cleanly reading off treatment effects by looking at simple group differences. These “imbalances” can be driven by selection effects into the treatment status so that if we want to estimate the average treatment effect in the population as a whole we need to be wary that our sample might not give us generalisable insight into the treatment differences. Using propensity scores as a measure of the prevalance to adopt the treatment status in the population, we can cleverly weight the observed data to privilege observations of “rare” occurence in each group. For example, if smoking is the treatment status and regular running is generally not common among the group of smokers, then on the occasion we see a smoker marathon runner we should heavily weight their outcome measure to overcome their low prevalence in the treated group but real presence in the unmeasured population. Inverse propensity weighting tries to define weighting schemes are inversely proportional to an individual’s propensity score so as to better recover an estimate which mitigates (somewhat) the risk of selection effect bias. For more details and illustration of these themes see the PyMC examples write up on Non-Parametric Bayesian methods [Forde, 2024]

--- View entire conversation on ReviewNB

NathanielF commented 2 months ago

Added in TREATMENT_EFFECT var

View entire conversation on ReviewNB

NathanielF commented 2 months ago

Added some explanation.

View entire conversation on ReviewNB

NathanielF commented 2 months ago

Again, have generally a more explanatory approach this time.

View entire conversation on ReviewNB

NathanielF commented 2 months ago

Added a note that these are the propensities we will seek to use.