:package: :game_die: R/txshift: Efficient Estimation of the Causal Effects of Stochastic Interventions, with Corrections for Outcome-Dependent Sampling
Setting 1:
W_1 ~ Binom(1/2)
\Delta | W_1 = w_1 ~ Binom(plogis(w_1))
A | W_1 = w_1 ~ Normal(2w_1, 1)
\Delta A = \Delta A
O = (W_1, \Delta A)
Simulate larger and larger data sets (and more and more bins) and fit condensier 2 ways:
no weights -- given \Delta = 1, fit condensier A ~ W_1
sane estimates => W_1 = 0 condensier fit should look like Normal(0,1)
W_1 = 1 condensier fit should look like Normal(2,1)
weights -- given \Delta = 1, fit condensier A ~ W_1, weights = 1/plogis(w_1) (i.e., use true weights)
sane estimates => W_1 = 0 condensier fit should look like Normal(0,1)
W_1 = 1 condensier fit should look like Normal(2,1)
Setting 2:
W_1 ~ Binom(1/2)
W_2 ~ Binom(1/2)
\Delta | W_1 = w_1, W_2 = w_2 ~ Binom(plogis(w_1 + w_2))
A | W_1 = w_1, W_2 = w_2 ~ Normal(2w_1w_2, 1)
\Delta A = \Delta A
\Delta W_2 = \Delta W_2
O = (W_1, \Delta W_2, \Delta A)
fit condensier 2 ways:
no weights -- given \Delta = 1, fit condensier A ~ W_1 + W_2
sane estimates => W_1 = 1, W_2 = 1 condensier should look like Normal(2,1)
else Normal(0,1)
weights -- given \Delta = 1, fit condensier A ~ W_1 + W_2, weights = 1/plogis(w_1 + w_2)
sane estimates => W_1 = 1, W_2 = 1 condensier should look like Normal(2,1)
else Normal(0,1)
The reason for checking both ways is the following. We initially thought that we would need weights for our problem in order to obtain a valid density estimate. The problem is as above, where there is some biased (e.g., case-control) sampling. However, it now seems to me that if W is the whole set of confounders of A and Delta, then you could actually just estimate the density in Delta = 1 folks and still be ok. This is because the observed data conditional density given Delta = 1 and W is the same as the full data conditional density given W (because \Delta \perp A | W). Does this seem right to you? Or am I crazy?
In any case, Nima is going to check the fits in both cases and we'll see. Right now I'm guessing there will be finite-sample differences but asymptotically you'll end up with the same quantity.
Setting 1: W_1 ~ Binom(1/2) \Delta | W_1 = w_1 ~ Binom(plogis(w_1)) A | W_1 = w_1 ~ Normal(2w_1, 1) \Delta A = \Delta A O = (W_1, \Delta A)
Simulate larger and larger data sets (and more and more bins) and fit condensier 2 ways:
no weights -- given \Delta = 1, fit condensier A ~ W_1 sane estimates => W_1 = 0 condensier fit should look like Normal(0,1) W_1 = 1 condensier fit should look like Normal(2,1) weights -- given \Delta = 1, fit condensier A ~ W_1, weights = 1/plogis(w_1) (i.e., use true weights) sane estimates => W_1 = 0 condensier fit should look like Normal(0,1) W_1 = 1 condensier fit should look like Normal(2,1) Setting 2: W_1 ~ Binom(1/2) W_2 ~ Binom(1/2) \Delta | W_1 = w_1, W_2 = w_2 ~ Binom(plogis(w_1 + w_2)) A | W_1 = w_1, W_2 = w_2 ~ Normal(2w_1w_2, 1) \Delta A = \Delta A \Delta W_2 = \Delta W_2 O = (W_1, \Delta W_2, \Delta A)
fit condensier 2 ways:
no weights -- given \Delta = 1, fit condensier A ~ W_1 + W_2 sane estimates => W_1 = 1, W_2 = 1 condensier should look like Normal(2,1) else Normal(0,1) weights -- given \Delta = 1, fit condensier A ~ W_1 + W_2, weights = 1/plogis(w_1 + w_2) sane estimates => W_1 = 1, W_2 = 1 condensier should look like Normal(2,1) else Normal(0,1) The reason for checking both ways is the following. We initially thought that we would need weights for our problem in order to obtain a valid density estimate. The problem is as above, where there is some biased (e.g., case-control) sampling. However, it now seems to me that if W is the whole set of confounders of A and Delta, then you could actually just estimate the density in Delta = 1 folks and still be ok. This is because the observed data conditional density given Delta = 1 and W is the same as the full data conditional density given W (because \Delta \perp A | W). Does this seem right to you? Or am I crazy?
In any case, Nima is going to check the fits in both cases and we'll see. Right now I'm guessing there will be finite-sample differences but asymptotically you'll end up with the same quantity.