Hello,
I have a data generating process where it seems like lmtp gives an incorrect effect estimate.
For $t = 1, \dots, \tau$, generate exposure variables as
$$A_t \sim \mathrm{Categorical}((0, 1, 2, 3), (0.25, 0.25, 0.25, 0.25)).$$
That is, at each timepoint $A_t$ takes one of the values $(0, 1, 2, 3)$ with equal probability.
The outcome is given by
$$Y \sim \mathrm{Normal}(A_1, 0.01^2).$$
The modified treatment policy is to shift the exposure variable up by 1, within the support of the data:
$$d(a) = I[a < 3] * (a + 1) + 3 \cdot I[a = 3].$$
I would expect the true effect estimate to be $0.25 \cdot 1 + 0.25 \cdot 2 + 0.5 \cdot 3 = 2.25.$ However, lmtp gives an estimate of $3$ (reprex included below).
I suspect this is linked to how lmtp estimates the sequential regressions. The relevant code in the tmle.R file is:
Here, the shifted predictions are being generated using a dataset in which every exposure variable has had the modified treatment policy applied to it.
In the LMTP paper, Equation (2) gives the following for the identification result:
$$m_t : (a_t, ht) \mapsto E[m{t+1}(A{t+1}^d, H{t+1}) | A_t = a_t, H_t = h_t]$$
and then the final parameter value is given by $$\theta = E[m_1(A_1^d, L_1)].$$ Taken together, this suggests that the shifted predictions should be based on a datasets in which only one exposure variable is shifted at a time (the conditional expectation is conditioned on $A_t = a_t^d, H_t = h_t$, where $h_t$ includes all prior exposures without any modified policy assigned). For the purposes of the reprex given below, this can be achieved in the code in an ad-hoc way by replacing those lines in tmle.R with
# Create a new shifted dataset where only the exposure at time t is intervened on
new_shifted <- natural
new_shifted$train[[paste0("A_", t)]] <- shifted$train[[paste0("A_", t)]]
new_shifted$valid[[paste0("A_", t)]] <- shifted$valid[[paste0("A_", t)]]
m_shifted_train[jt & rt, t] <- bound(SL_predict(fit, new_shifted$train[jt & rt, vars]), 1e-05)
m_shifted_valid[jv & rv, t] <- bound(SL_predict(fit, new_shifted$valid[jv & rv, vars]), 1e-05)
With this modification, in the example below lmtp gives the expected effect estimate.
Thanks!
Herb
reprex
library(lmtp)
#> Warning: package 'lmtp' was built under R version 4.2.3
simulate_data <- function(seed, N, tau, tstar = 1) {
set.seed(seed)
data <- data.frame(id = 1:N)
for(t in 1:tau) {
# Time-varying cofounder
Lt <- paste0("L_", t)
data[[Lt]] <- rbinom(N, 1, 0.5)
# Treatment
At <- paste0("A_", t)
data[[At]] <- sample(0:3, size = N, replace = TRUE)
}
# Outcome
mu <- data[[paste0("A_", tstar)]]
data$Y <- rnorm(N, mu, 0.01)
data
}
# Expected true effect
mean(c(1, 2, 3, 3))
#> [1] 2.25
tau <- 3
N <- 1e3
tstar <- 1
dat <- simulate_data(153, N, tau, tstar)
a <- paste0("A_", 1:tau)
w <- Map(\(t) paste0("L_", t), 1:tau)
y <- "Y"
learners <- c("SL.mean", "SL.glm", "SL.ranger")
policy <- \(data, trt) {
ifelse(data[[trt]] == 3, 3, data[[trt]] + 1)
}
fit <- lmtp_tmle(
data = dat,
trt = a,
time_vary = w,
outcome = y,
shift = policy,
mtp = FALSE,
learners_outcome = learners,
learners_trt = learners,
outcome_type = "continuous",
folds = 5
)
#> Loading required package: nnls
#> Warning: package 'nnls' was built under R version 4.2.3
#> Loading required namespace: ranger
fit
#> LMTP Estimator: TMLE
Hello, I have a data generating process where it seems like
lmtp
gives an incorrect effect estimate. For $t = 1, \dots, \tau$, generate exposure variables as $$A_t \sim \mathrm{Categorical}((0, 1, 2, 3), (0.25, 0.25, 0.25, 0.25)).$$ That is, at each timepoint $A_t$ takes one of the values $(0, 1, 2, 3)$ with equal probability. The outcome is given by $$Y \sim \mathrm{Normal}(A_1, 0.01^2).$$The modified treatment policy is to shift the exposure variable up by 1, within the support of the data: $$d(a) = I[a < 3] * (a + 1) + 3 \cdot I[a = 3].$$
I would expect the true effect estimate to be $0.25 \cdot 1 + 0.25 \cdot 2 + 0.5 \cdot 3 = 2.25.$ However,
lmtp
gives an estimate of $3$ (reprex included below).I suspect this is linked to how
lmtp
estimates the sequential regressions. The relevant code in thetmle.R
file is:Here, the shifted predictions are being generated using a dataset in which every exposure variable has had the modified treatment policy applied to it.
In the LMTP paper, Equation (2) gives the following for the identification result: $$m_t : (a_t, ht) \mapsto E[m{t+1}(A{t+1}^d, H{t+1}) | A_t = a_t, H_t = h_t]$$ and then the final parameter value is given by $$\theta = E[m_1(A_1^d, L_1)].$$ Taken together, this suggests that the shifted predictions should be based on a datasets in which only one exposure variable is shifted at a time (the conditional expectation is conditioned on $A_t = a_t^d, H_t = h_t$, where $h_t$ includes all prior exposures without any modified policy assigned). For the purposes of the reprex given below, this can be achieved in the code in an ad-hoc way by replacing those lines in
tmle.R
withWith this modification, in the example below
lmtp
gives the expected effect estimate.Thanks!
reprex
Created on 2024-02-28 with reprex v2.1.0
Please include your R session info: