Closed jburos closed 7 years ago
Looking through the code, my guess is that the above-mentioned issue would apply to any binomial biomarker whether or not it is context of multiple biomarkers. The error message would of course only apply in the first case.
After the above commit, the following works, albeit not without a warning:
mv4 <- stan_jm(
formulaLong = list(
logBili ~ year + (1 | id),
albumin_high ~ sex + year + (year | id)),
dataLong = pbcLong %>% dplyr::mutate(albumin_high = albumin > 3.5),
formulaEvent = Surv(futimeYears, death) ~ sex + trt,
dataEvent = pbcSurv,
assoc = list("etavalue", "shared_b(1)"),
family = list(gaussian(), binomial()),
time_var = "year",
chains = 1, cores = 1, seed = 12345, iter = 100,
init = 'random'
)
Warning messages:
Warning messages:
1: In mapply(check_if_rescaled, user_priorLong, y_has_predictors, adjusted_priorLong_scale) :
longer argument not a multiple of length of shorter
2: In mapply(check_if_rescaled, user_priorLong, y_has_predictors, adjusted_priorLong_scale) :
longer argument not a multiple of length of shorter
However, using model-based inits yields a new error message:
mv4 <- stan_jm(
formulaLong = list(
logBili ~ year + (1 | id),
albumin_high ~ sex + year + (year | id)),
dataLong = pbcLong %>% dplyr::mutate(albumin_high = albumin > 3.5),
formulaEvent = Surv(futimeYears, death) ~ sex + trt,
dataEvent = pbcSurv,
assoc = list("etavalue", "shared_b(1)"),
family = list(gaussian(), binomial()),
time_var = "year",
chains = 1, cores = 1, seed = 12345, iter = 100
)
Yields:
Multivariate joint model specified
Please note the warmup phase may be much slower than later iterations!
SAMPLING FOR MODEL 'jm' NOW (CHAIN 1).
[1] "Initialization from source failed."
[2] "mismatch in dimension declared and found in context; processing stage=initialization; variable name=aux_unscaled; position=0; dims declared=(1); dims found=(2)"
[3] "Error in sampler$call_sampler(args_list[[i]]) : "
[4] "In addition: Warning messages:"
[5] "1: In mapply(check_if_rescaled, user_priorLong, y_has_predictors, adjusted_priorLong_scale) :"
[6] " longer argument not a multiple of length of shorter"
[7] "2: In mapply(check_if_rescaled, user_priorLong, y_has_predictors, adjusted_priorLong_scale) :"
[8] " longer argument not a multiple of length of shorter"
error occurred during calling the sampler; sampling not done
Error in check_stanfit(stanfit) :
Invalid stanfit object produced please report bug
As an aside, I'm curious to know why you re-order the data for binomial observations within the handle_glmod
function, here:
# Reorder y, X, Z if bernoulli (zeros first)
if (is.binomial(family$family) && all(y %in% 0:1)) {
ord <- order(y)
y <- y [ord]
trials <- trials[ord]
xtemp <- xtemp [ord, , drop = FALSE]
Ztlist <- lapply(Ztlist, function(x) x[, ord, drop = FALSE])
flist <- lapply(flist, function(x) x[ord])
N01 <- sapply(0:1, function(x) sum(y == x))
} else {
ord <- NULL
N01 <- rep(0L, 2) # dud entry if not bernoulli
}
I think the commit referenced above should resolve most of these issues.
As well as the change made here.
The reason for reordering the bernoulli observations is so that I can follow Ben and Jonah's approach for incrementing the target in bernoulli models -- see their approach here. From memory, they create completely separate data blocks (i.e. design matrices, etc) for observations corresponding to 0's and 1's. I basically tried to copy that idea, but it's a little messier in the stan_jm
case though because bernoulli outcomes might be combined with gaussian or other outcomes, so I can't just pass two separate data blocks (one for 0's and one for 1's) -- since jm.stan
relies on one big block diagonal design matrix for all longitudinal submodels. So, instead of providing separate data blocks for the 0's and 1's I just sort the observations based on 0's and 1's and then use some indexing to identify which rows of eta
to use when incrementing the target later on in Stan, as seen here and here. This may be kinda clunky? I dunno.
Summary:
Seeing the following error when I fit a model with more than one biomarker, where at least one of the biomarkers is
binomial()
.However, the number of grouping levels in the input data is the same for each biomarker.
Description:
In this case, I see the following when running
stan_jm
in debug mode:Here, the first biomarker is
gaussian()
where the second isbinomial()
.It appears that the
_NEW_patient_id
record is only being added for the first biomarker, and not the second. This may apply to binomial-only biomarker case, without yielding the same error message. Not yet confirmed.Reproducible Steps:
RStanARM Version:
First identified using version installed from jburos/rstanarm.
Confirmed using latest
master
branch fromsambrilleman/rstanarm
:R Version:
Operating System:
Linux / Ubuntu