[BUG] Indexing error when infusion events end close to start of another event

aryaamgen commented 2 years ago

Description

This bug tends to happen (but not always) when an infusion event (evid=1) has a rate that make it such that the infusion ends very close to the time of the next event. For example if amt=0.5 and rate=0.5 and the infusion starts at time=6 then it will end at time=7 and the bug will occur when you have another event that occurs close to time=7.

This only happens when I compile with MPI.

Specifically, the error I get is about an index being out of range somewhere:

Chain 1 test: stan/lib/stan_math/lib/eigen_3.3.9/Eigen/src/Core/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::Index) [with Derived = Eigen::Matrix<stan::math::var_value<double>, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = stan::math::var_value<double>; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.
Warning: Chain 1 finished unexpectedly!

Example

I tried to simplify the model and provide a few cases to best illustrate the bug. For some of these I don't get the error and for some I do. Below is the Stan model and the R code to run the different cases.

Simple Example Stan Model

functions{
  vector pk(real t, vector x, array[] real parms, array[] real rate, array[] int dummy){

    // Parameters
    real CL = parms[1];
    real Q = parms[2];
    real V1 = parms[3];
    real V2 = parms[4];
    real ka = parms[5];

    real lambda1 = parms[6];
    real lambda2 = parms[7];

    // Re-parametrization
    real k10 = CL / V1;
    real k12 = Q / V1;
    real k21 = Q / V2;

    // Return object (derivative)
    vector[3] y;

    // PK component of the ODE system
    y[1] = -ka*x[1];
    y[2] = ka*x[1] - (k10 + k12)*x[2] + k21*x[3];
    y[3] = k12*x[2] - k21*x[3];

    return y;
  }
}

data {
  // Count data
  int<lower = 1> nt;  // number of events
  int<lower = 1> nSbj; // number of subjects
  array[nSbj] int<lower = 1> len; // Number of events per subject

  // NONMEM data
  array[nt] real<lower = 0> amt;
  array[nt] int<lower = 1> cmt;
  array[nt] int<lower = 0> evid;
  array[nt] real<lower = 0> time;
  array[nt] real<lower = 0> ii;
  array[nt] int<lower = 0> addl;
  array[nt] int<lower = 0> ss;
  array[nt] real<lower = 0> rate;
}

transformed data{
  int nCmt = 3;
  int nTheta = 7;
  real rtol = 1e-3;
  real atol = 1e-3;
  int max_step = 10000;
}

parameters {
  real<lower = 0> sigma;
}

transformed parameters{
  array[nSbj, nTheta] real theta;
  matrix[nCmt, nt] x;

  // Set individual parameters
  for(j in 1:nSbj) {
    theta[j] = {1.03, 1.27, 3.57, 1.61, 0.0, 0.0, 0.0};
  }

  x = pmx_solve_group_bdf(pk, nCmt,
                           len,
                           time, amt, rate, ii, evid, cmt, addl, ss,
                           theta,
                           rtol, atol, max_step);
}

model {

}

Simple R Script illustrating

library(tidyverse)
cmdstanr::set_cmdstan_path("/data/home/apourzan/Torsten/cmdstan")
library(cmdstanr)

# Compile
model <- cmdstan_model("Stan/test.stan")

# Case 1
# End of first infusion gets close to start of second
dat_stan <-
  list(nt = 3,
       nSbj = 1,
       len = 3,
       amt = c(0.4, 0.4, 0.0),
       cmt = c(2, 2, 2),
       evid = c(1, 1, 0),
       time = c(18.55555555555555358183,
                19.53125000000000000000,
                33.52430555555555713454),
       ii = c(0, 0, 0),
       addl = c(0, 0, 0),
       ss = c(0, 0, 0),
       rate = c(0.4099644128113879015807,
                0.4000000000000000222045,
                0.0000000000000000000000))

dat_stan$time[2] - (dat_stan$time[1] + dat_stan$amt[1]/dat_stan$rate[1])

fit <-
  model$sample(
    data = dat_stan,
    seed = 2022,
    chains = 1,
    iter_warmup = 1e0,
    iter_sampling = 1e0,
    parallel_chains = 1
  )

# Case 2
# If we round the time, then end of first doesn't get AS close to start of second and it works
dat_stan$time[1] <- 18.55556

dat_stan$time[2] - (dat_stan$time[1] + dat_stan$amt[1]/dat_stan$rate[1])

fit <-
  model$sample(
    data = dat_stan,
    seed = 2022,
    chains = 1,
    iter_warmup = 1e0,
    iter_sampling = 1e0,
    parallel_chains = 1
  )

# Case 3
# If we take out the third event it also works despite the same problem of end of infusion being close to start of next. Maybe only breaks if there's an observation event?
dat_stan <-
  list(nt = 2,
       nSbj = 1,
       len = 2,
       amt = c(0.4, 0.4),
       cmt = c(2, 2),
       evid = c(1, 1),
       time = c(18.55555555555555358183,
                19.53125000000000000000),
       ii = c(0, 0),
       addl = c(0, 0),
       ss = c(0, 0),
       rate = c(0.4099644128113879015807,
                0.4000000000000000222045))

dat_stan$time[2] - (dat_stan$time[1] + dat_stan$amt[1]/dat_stan$rate[1])

fit <-
  model$sample(
    data = dat_stan,
    seed = 2022,
    chains = 1,
    iter_warmup = 1e0,
    iter_sampling = 1e0,
    parallel_chains = 1
  )

# Case 4
# Different times, still 3 events, but end of infusion still gets close to start of next event, yet it doesn't break
dat_stan <-
  list(nt = 3,
       nSbj = 1,
       len = 3,
       amt = c(0.4, 0.4, 0.0),
       cmt = c(2, 2, 2),
       evid = c(1, 1, 0),
       time = c(18.0,
                19.0,
                33.52430555555555713454),
       ii = c(0, 0, 0),
       addl = c(0, 0, 0),
       ss = c(0, 0, 0),
       rate = c(0.4,
                0.4,
                0.0000000000000000000000))

dat_stan$time[2] - (dat_stan$time[1] + dat_stan$amt[1]/dat_stan$rate[1])

fit <-
  model$sample(
    data = dat_stan,
    seed = 2022,
    chains = 1,
    iter_warmup = 1e0,
    iter_sampling = 1e0,
    parallel_chains = 1
  )
# Case 5
# Another case where it breaks and the end time of infusion is exactly at the start time of next event
dat_stan <-
  list(nt = 2,
       nSbj = 1,
       len = 2,
       amt = c(0.8, 0.0),
       cmt = c(2, 2),
       evid = c(1, 0),
       time = c(32.45833333333333570181,
                32.52430555555555713454),
       ii = c(0, 0),
       addl = c(0, 0),
       ss = c(0, 0),
       rate = c(12.12631578947368460319,
                0.00000000000000000000))

dat_stan$time[2] - (dat_stan$time[1] + dat_stan$amt[1]/dat_stan$rate[1])

fit <-
  model$sample(
    data = dat_stan,
    seed = 2022,
    chains = 1,
    iter_warmup = 1e0,
    iter_sampling = 1e0,
    parallel_chains = 1
  )

Expected Output

When it works the chains finish. Otherwise, I get the indexing error above.

Current Version:

Using the latest version of Torsten, v0.90.

yizhang-yiz commented 1 year ago

Thanks for the nice report. I’m able to reproduce the issue. Fix on the way.

aryaamgen commented 1 year ago

Thanks for the nice report. I’m able to reproduce the issue. Fix on the way.

Wow that was quick. Thanks @yizhang-yiz What was going on under the hood?

yizhang-yiz commented 1 year ago

My guess is it has something to do with mpi-dedicated solver but I haven't been able to look deeper. I was planning to fix it last week but obviously need more time.

metrumresearchgroup / Torsten