Closed Dpananos closed 1 week ago
If I use comparison="lnratioavg"
and transform=exp
I get an answer closer to what I expect
# However, now the lift is very far off from what it was
avg_comparisons(
fit,
variables = 'trt',
comparison = 'lnratioavg',
transform = exp,
newdata = datagrid(trt = c('treat', 'control'), grid_type = 'counterfactual')
)
Term Contrast Estimate Pr(>|z|) S 2.5 % 97.5 %
trt ln(mean(treat) / mean(control)) 1.41 <0.001 Inf 1.4 1.43
Columns: term, contrast, estimate, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted
Type: response
These approaches also give the right answer
> datagrid(newdata=exp_d, trt = c('treat', 'control'), grid_type = 'counterfactual') %>%
+ modelr::add_predictions(fit, type='response') %>%
+ group_by(trt) %>%
+ summarise(mean(pred))
# A tibble: 2 × 2
trt `mean(pred)`
<chr> <dbl>
1 control 0.0886
2 treat 0.125
> datagrid(model=fit, trt = c('treat', 'control'), grid_type = 'counterfactual') %>%
+ modelr::add_predictions(fit, type='response') %>%
+ group_by(trt) %>%
+ summarise(mean(pred))
# A tibble: 2 × 2
trt `mean(pred)`
<chr> <dbl>
1 control 0.0886
2 treat 0.125
I've simulated some data for a talk I'm giving. The data is from an experiment with a binary exposure and a binary outcome. My goal is to estimate the "lift" (risk difference between treatment and control, divided by risk in control) using marginaleffects.
However, I'm not getting the answer I expect and was hoping to determine if it is a user error or not.
Below, I set up some data and create my sample. I provide the sampling weights
weights
in my dataframed
. Using these weights, I can construct the expected outcome under treatment or control usingweighted.avg
as followsUsing
weighted_avg
, the "lift" should be around 40%.Fitting a logistic regression with treatment as the only covariate, I can also recover the lift.
But as soon as I add interactions into my model, the lift is completely wrong.
Indeed, the
avg_predictions
don't match what the actual average predictions should beCreated on 2024-06-29 with reprex v2.1.0
While the
avg_predictions
depend on the estimated probability in each strata, it looks likefit
accurately estimates the risk in strata defined by device and lang, so I don't think that is the issue.