Closed NathanielF closed 1 month ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Attention: Patch coverage is 95.65217%
with 11 lines
in your changes are missing coverage. Please review.
Project coverage is 80.09%. Comparing base (
3dc2ffe
) to head (0a54532
). Report is 14 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
causalpy/pymc_experiments.py | 95.31% | 9 Missing :warning: |
causalpy/data_validation.py | 81.81% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Sory, not had time to look at this yet I'm afraid - a combination of work + illness. I might also not get time to look next week because of a deadline on a client project. Looking forward to when I can dive into this đź‘Ť
Because this will be a new feature addition, and probably trigger a minor version bump (semantic versioning), I'll ask for at least one other review.
No worries.
FYI, I'll review this next week @NathanielF -- made space for it
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-04-08T16:53:44Z ----------------------------------------------------------------
Link to Hernan's book
NathanielF commented on 2024-04-14T19:12:04Z ----------------------------------------------------------------
Done
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-04-08T16:53:45Z ----------------------------------------------------------------
Typo: "That is to say, the condition of strong ignorability holds if the treatment status T is independent of the propensity p(X), conditional on the X"
NathanielF commented on 2024-04-14T19:12:13Z ----------------------------------------------------------------
Fixed
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-04-08T16:53:45Z ----------------------------------------------------------------
You're using the raw weighting scheme in the function call, not the robust one, contrary to what you're saying in the text. Is that a typo?
NathanielF commented on 2024-04-14T19:27:48Z ----------------------------------------------------------------
That was a typo.
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-04-08T16:53:46Z ----------------------------------------------------------------
NathanielF commented on 2024-04-14T19:27:56Z ----------------------------------------------------------------
fixed
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-04-08T16:53:47Z ----------------------------------------------------------------
What's the y-axis on the first plot?
NathanielF commented on 2024-04-14T19:31:40Z ----------------------------------------------------------------
Count of observations. These are still histograms, just layered histograms for both the observed propensity scores and the reweighted propensity scores under different draws from the posterior of the propensity score distribution
NathanielF commented on 2024-04-14T19:31:52Z ----------------------------------------------------------------
Added ylabel to the plot
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-04-08T16:53:47Z ----------------------------------------------------------------
NathanielF commented on 2024-04-14T19:32:01Z ----------------------------------------------------------------
Split this out.
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-04-08T16:53:48Z ----------------------------------------------------------------
NathanielF commented on 2024-04-14T19:32:23Z ----------------------------------------------------------------
Added some more explanation.
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-04-08T16:53:49Z ----------------------------------------------------------------
robust is what you used to fit the model, right? Then how reliable is the doubly-robust estimation? Do you have to re-fit the model?
NathanielF commented on 2024-04-14T19:33:29Z ----------------------------------------------------------------
You don't have to re-fit the model. The weighting is a post-processing step so you can apply different weighting schemes after the model is fit using a kwarg. I've added a note to clarify this.
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-04-08T16:53:50Z ----------------------------------------------------------------
NathanielF commented on 2024-04-14T20:02:07Z ----------------------------------------------------------------
I've clarified that the differences and how the methods need not align, but differences between them would be indicative of a miscalibrated propensity model.
Thanks Alex. Will try and get to it this weekend
warning when locally building the docs...
Inverse Propensity Score Weighting
""""""""""""""""""""""""""""""""
/Users/benjamv/git/CausalPy/docs/source/index.rst:126: WARNING: Title underline too short.
Would be good to update from main
just to keep everything fresh
doctests pass locally âś…
Tests pass locally âś…
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-04-11T10:18:38Z ----------------------------------------------------------------
Could (very briefly) describe what the NHEFS data set is about OR just be vague about it and talk about a 'real world dataset' which you go on to describe in the 'NHEFS Data' section.
Would be good to either have a short explanation of what propensity scores are, or a link to a glossary item. Also the relationship between the propensity and why the "inverse" propensity. Just to help make this more accessible for readers unfamiliar
Some explanation about what a weighting scheme is would be good - It's great to have a reference out, but I think it would be stronger if the post was a bit more self-encapsulated / dydactic.
Would be good to have an itemised list with (at least) 1-sentence explainers about each of the reweighing schemes.
NathanielF commented on 2024-04-14T20:05:42Z ----------------------------------------------------------------
Updated with more clarity about the method and the intent.
In this notebook we will briefly demonstrate how to use propensity score weighting schemes to recover treatment effects in the analysis of observational data. We will first showcase the method with a simulated data example drawn from Lucy D’Agostino McGowan’s excellent blog on inverse propensity score weighting. Then we shall apply the same techniques to NHEFS data set discussed in Miguel Hernan and Robins’ Causal Inference: What if book. This data set measures the effect of quitting smoking between the period of 1971 and 1982. At each of these two points in time the participant’s weight was recorded, and we seek to estimate the effect of quitting in the intervening years on the weight recorded in 1982.
We will use inverse propensity score weighting techniques to estimate the average treatment effect. There are a range of weighting techniques available: we have implementedraw
,robust
,doubly robust
andoverlap
weighting schemes all of which aim to estimate the average treatment effect. The idea of a propensity score (very broadly) is to derive a one-number summary of individual’s probability of adopting a particular treatment. This score is typically calculated by fitting a predictive logit model on all an individual’s observed attributes predicting whether or not the those attributes drive the individual towards the treatment status. In the case of the NHEFS data we want a model to measure the propensity for each individual to quit smoking.
The reason we want this propensity score is because with observed data we often have a kind of imbalance in our covariate profiles across treatment groups. Meaning our data might be unrepresentative in some crucial aspect. This prevents us cleanly reading off treatment effects by looking at simple group differences. These “imbalances” can be driven by selection effects into the treatment status so that if we want to estimate the average treatment effect in the population as a whole we need to be wary that our sample might not give us generalisable insight into the treatment differences. Using propensity scores as a measure of the prevalance to adopt the treatment status in the population, we can cleverly weight the observed data to privilege observations of “rare” occurence in each group. For example, if smoking is the treatment status and regular running is generally not common among the group of smokers, then on the occasion we see a smoker marathon runner we should heavily weight their outcome measure to overcome their low prevalence in the treated group but real presence in the unmeasured population. Inverse propensity weighting tries to define weighting schemes are inversely proportional to an individual’s propensity score so as to better recover an estimate which mitigates (somewhat) the risk of selection effect bias. For more details and illustration of these themes see the PyMC examples write up on Non-Parametric Bayesian methods [Forde, 2024]_drbenvincent commented on 2024-05-02T09:14:12Z_ ----------------------------------------------------------------
nice!
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-04-11T10:18:38Z ----------------------------------------------------------------
Could it be a good idea to add a data visualisation in here, possibly sns.pairplot with hue=trt ? Not essential, but just a thought
Maybe declare TREATMENT_EFFECT = 2 and use that in the data generation code. Just to make it really obvious
NathanielF commented on 2024-04-14T20:06:06Z ----------------------------------------------------------------
Added in TREATMENT_EFFECT var
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-04-11T10:18:39Z ----------------------------------------------------------------
Needs a brief explanation. What's the y-axis?
NathanielF commented on 2024-04-14T20:06:19Z ----------------------------------------------------------------
Added some explanation.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-04-11T10:18:40Z ----------------------------------------------------------------
This is great. But I think this can be expanded upon to give a slightly more didactic explanation/introduction into the logic of inverse propensity approach, maybe with some links.
NathanielF commented on 2024-04-14T20:06:42Z ----------------------------------------------------------------
Again, have generally a more explanatory approach this time.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-04-11T10:18:40Z ----------------------------------------------------------------
Needs some explanation about what we are looking at here
NathanielF commented on 2024-04-14T20:07:03Z ----------------------------------------------------------------
Added a note that these are the propensities we will seek to use.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-04-11T10:18:41Z ----------------------------------------------------------------
Could be a bit more explicit about what the left and right panels are showing.
NathanielF commented on 2024-04-14T20:07:21Z ----------------------------------------------------------------
Added more explict flagging.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-04-11T10:18:42Z ----------------------------------------------------------------
Final sentence needs a full stop.
NathanielF commented on 2024-04-14T20:08:02Z ----------------------------------------------------------------
Added full stop.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-04-11T10:18:43Z ----------------------------------------------------------------
In addition to Alex's comments, I might suggest labelling alphabetically, and including those (e.g. (a), (b), (c) in the subfigure titles. No, ignore this suggestion.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-04-11T10:18:44Z ----------------------------------------------------------------
Could potentially ad a markdown horizontal line "---" to emphasise that we are changing gears here. Otherwise maybe just add something like "Having warmed up with simulated data, let's look at some real data..."
NathanielF commented on 2024-04-14T20:08:22Z ----------------------------------------------------------------
Added markdown separator.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2024-04-11T10:18:44Z ----------------------------------------------------------------
For the link to your pymc-example... you could grab the bibtex and make that into a proper reference. There are some examples of doing that in some of the other example notebooks.
NathanielF commented on 2024-04-14T20:08:33Z ----------------------------------------------------------------
Added bibtex
Count of observations. These are still histograms, just layered histograms for both the observed propensity scores and the reweighted propensity scores under different draws from the posterior of the propensity score distribution
View entire conversation on ReviewNB
You don't have to re-fit the model. The weighting is a post-processing step so you can apply different weighting schemes after the model is fit using a kwarg. I've added a note to clarify this.
View entire conversation on ReviewNB
I've clarified that the differences and how the methods need not align, but differences between them would be indicative of a miscalibrated propensity model.
View entire conversation on ReviewNB
Updated with more clarity about the method and the intent.
In this notebook we will briefly demonstrate how to use propensity score weighting schemes to recover treatment effects in the analysis of observational data. We will first showcase the method with a simulated data example drawn from Lucy D’Agostino McGowan’s excellent blog on inverse propensity score weighting. Then we shall apply the same techniques to NHEFS data set discussed in Miguel Hernan and Robins’ Causal Inference: What if book. This data set measures the effect of quitting smoking between the period of 1971 and 1982. At each of these two points in time the participant’s weight was recorded, and we seek to estimate the effect of quitting in the intervening years on the weight recorded in 1982.
We will use inverse propensity score weighting techniques to estimate the average treatment effect. There are a range of weighting techniques available: we have implementedraw
,robust
,doubly robust
andoverlap
weighting schemes all of which aim to estimate the average treatment effect. The idea of a propensity score (very broadly) is to derive a one-number summary of individual’s probability of adopting a particular treatment. This score is typically calculated by fitting a predictive logit model on all an individual’s observed attributes predicting whether or not the those attributes drive the individual towards the treatment status. In the case of the NHEFS data we want a model to measure the propensity for each individual to quit smoking.
The reason we want this propensity score is because with observed data we often have a kind of imbalance in our covariate profiles across treatment groups. Meaning our data might be unrepresentative in some crucial aspect. This prevents us cleanly reading off treatment effects by looking at simple group differences. These “imbalances” can be driven by selection effects into the treatment status so that if we want to estimate the average treatment effect in the population as a whole we need to be wary that our sample might not give us generalisable insight into the treatment differences. Using propensity scores as a measure of the prevalance to adopt the treatment status in the population, we can cleverly weight the observed data to privilege observations of “rare” occurence in each group. For example, if smoking is the treatment status and regular running is generally not common among the group of smokers, then on the occasion we see a smoker marathon runner we should heavily weight their outcome measure to overcome their low prevalence in the treated group but real presence in the unmeasured population. Inverse propensity weighting tries to define weighting schemes are inversely proportional to an individual’s propensity score so as to better recover an estimate which mitigates (somewhat) the risk of selection effect bias. For more details and illustration of these themes see the PyMC examples write up on Non-Parametric Bayesian methods [Forde, 2024]--- View entire conversation on ReviewNB
Added a note that these are the propensities we will seek to use.
View entire conversation on ReviewNB
There are a number of paragraphs missing full stops at the very end
In relation to this issue: https://github.com/pymc-labs/CausalPy/issues/303
I'm opening the PR which includes functionality for fitting a propensity score model and analysing the experimental outcomes under different re-weighting schemes.
I've added the relevant classes to and models. I've also demonstrated their use an example notebook with a parameter recovery exercise and an application to real data.
I've added two plotting functions to experiment class to both analyse covariate balance and plot the overlap of the propensity scores and the uncertainty in the estimation of causal effects.