Open cthoyt opened 2 months ago
The problem we are trying to solve is to identify the causal effect of an intervention on an outcome when we have two datasets that came from the same underlying population. The first is an observational dataset where we have measurements of the intervention variables, the surrogate variables, and the outcome variables where people chose to get the intervention. The second is an experimental dataset that only contains measurements of the intervention and the surrogate variables (but not the outcome variables) where people were randomized to receive (or not receive the intervention). The goal is to estimate the causal effect of the intervention on the outcome using the information contained in both the observational data and the experimental data.
Because the observational and experimental data sets were sampled from the same underlying population, the surrogate outcome problem is not technically a transportability problem, but we can transform the surrogate outcome problem into an equivalent transportability problem so that we can determine when the surrogate outcome problem is identifiable.
A real-world example is where we want to know the causal effect of a vaccine on protection from infection, and all we have is measurements of people who chose to get the vaccine, their antibody levels and whether they subsequently got infected. We also have a dataset from the same population where we randomized on who got the vaccine but only measured their antibodies, not whether they got infected. The question is: can we estimate the vaccine efficacy by combining these two datasets?
Follow-up to #149:
Add high level documentation to "transport" module