metrumresearchgroup / mrgsolve

Simulate from ODE-based population PK/PD and QSP models in R
https://mrgsolve.org
GNU General Public License v2.0
131 stars 36 forks source link

mrgsim_q() always include observation record at time 0 #1205

Closed stevechoy closed 4 months ago

stevechoy commented 5 months ago

Hi Kyle,

I'm not sure if this is intended or not, but a record at time 0 is always produced by mrgsim_q() regardless of what stime is supplied. Please see below for a reprex:

input_model_object <- mread("pk1cmt", modlib()) dosing_scheme <- ev(amt = 50, time = 0, cmt = 1, tinf = 1,
total = 28, ii = 12 ) sampling_times <- c(1,3,6,12,24)

sim_output <- mrgsim_q(input_model_object, data = dosing_scheme, stime = sampling_times, output = "df")

head(sim_output) ID time EV1 CENT EV2 CP 1 1 0 0.0000000000 0.00000 0 0.0000000 2 1 1 31.6060279413 18.06794 0 0.9033972 3 1 3 4.2774107032 41.94951 0 2.0974753 4 1 6 0.2129597505 39.75748 0 1.9878738 5 1 12 0.0005278739 29.61858 0 1.4809288 6 1 24 0.0005278773 45.87390 0 2.2936949

I would have expected that the output only contains times at c(1,3,6,12,24) ? The documentation I find on mrsim_q() didn't indicate there are any options related to this.

Regards Steve

kylebaron commented 5 months ago

Hi @stevechoy -

That record at time==0 is the dose; I think you'll always get that with mrgsim_q(). There's also qsim(), which let's you pass obsonly. Anyway, you get the same answer if you use mrgsim(). I put some timings in my reprex so you can see what the differential is. If you're simulating alot you might feel some difference; if it's just once or a handful of times, you probably can't tell and might as well use mrgsim() for the extra flexibility.

Please let me know if this answers the question.

Kyle

library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(microbenchmark)

input_model_object <- mread("pk1cmt", modlib())
#> Building pk1cmt ...
#> done.
dosing_scheme <- ev(amt = 50,
                    time = 0,
                    cmt = 1,
                    tinf = 1,
                    total = 28,
                    ii = 12
)
sampling_times <- c(1,3,6,12,24)

sim_output <- mrgsim_q(input_model_object,
                       data = dosing_scheme,
                       stime = sampling_times,
                       output = "df")

sim_output
#>   ID time          EV1     CENT EV2        CP
#> 1  1    0 0.000000e+00  0.00000   0 0.0000000
#> 2  1    1 3.160603e+01 18.06794   0 0.9033972
#> 3  1    3 4.277411e+00 41.94951   0 2.0974753
#> 4  1    6 2.129598e-01 39.75748   0 1.9878738
#> 5  1   12 5.278739e-04 29.61858   0 1.4809288
#> 6  1   24 5.278773e-04 45.87390   0 2.2936949

out2 <- mrgsim(input_model_object, dosing_scheme, 
               tgrid = sampling_times, output = "df")

identical(sim_output, out2)
#> [1] TRUE

microbenchmark(
 mrgsim_q =  mrgsim_q(input_model_object,
           data = dosing_scheme,
           stime = sampling_times,
           output = "df"), 
 mrgsim = mrgsim(input_model_object, dosing_scheme, 
                 tgrid = sampling_times, output = "df"),
 qsim = qsim(input_model_object, dosing_scheme, tgrid = sampling_times, 
             output = "df"),
times = 3000  
)
#> Warning in microbenchmark(mrgsim_q = mrgsim_q(input_model_object, data =
#> dosing_scheme, : less accurate nanosecond times to avoid potential integer
#> overflows
#> Unit: microseconds
#>      expr     min       lq     mean   median      uq      max neval
#>  mrgsim_q 130.216 155.6770 171.3567 161.1300 167.567 1962.055  3000
#>    mrgsim 229.764 258.9765 293.4557 267.5455 280.194 6852.740  3000
#>      qsim 149.322 175.3160 195.7433 181.2200 189.297 5656.196  3000

Created on 2024-06-27 with reprex v2.0.2

stevechoy commented 5 months ago

Hi Kyle,

Thanks for the quick reply. What I'm confused about is why only the dose at time 0 gets slotted in, when according to the ii setting, there should also be a dose at time = 12 and time = 24, but they do not appear in the output. So the behavior seems to be inconsistent to me (or maybe I'm not understanding how it works).

For context, I'm trying to squeeze in some performance gains for multiple subject simulations (e.g. n=1000). So a single simulation is run with a dataset containing covariates etc. The resulting output dataframe(s) takes up quite a lot of memory, which I suspect can contribute to the slowness also (if there's a built-in argument to round the decimal places it would also be welcome).

The differences between the various functions (mrgsim_q, mrgsim, qsim) seem to largely disappear when simulating multiple subjects, but since I can't get parallelization to work reliably on Shiny (any tips on this area will be greatly appreciated btw), I'll take what I can get :)

Please see example below, modifying your reprex to include multiple subjects:

input_model_object <- mread("pk1cmt", modlib())

dosing_scheme <- ev(amt = 50, time = 0, cmt = 1, tinf = 1, total = 28, ii = 12 )

ext_db <- tibble(ID = 1:100, WT = rnorm(100, mean = 70, sd = 5) )

Joining ext_db with ev_df

ev_df <- dosing_scheme %>% mrgsolve::ev_rep((1:nrow(ext_db)))

sampling_times <- seq(1, 336, by = 1)

sim_output <- mrgsim_q(input_model_object, data = ev_df, stime = sampling_times, output = "df")

out2 <- mrgsim(input_model_object, ev_df, tgrid = sampling_times, output = "df")

qsim2 = qsim(input_model_object, ev_df, tgrid = sampling_times, output = "df")

identical(sim_output, out2) identical(out2, qsim2)

microbenchmark( mrgsim_q = mrgsim_q(input_model_object, data = ev_df, stime = sampling_times, output = "df"), mrgsim = mrgsim(input_model_object, ev_df, tgrid = sampling_times, output = "df"), qsim2 = qsim(input_model_object, ev_df, tgrid = sampling_times, output = "df"), times = 100
)

Unit: milliseconds

expr min lq mean median uq max neval cld

mrgsim_q 155.8756 159.1534 166.8108 160.9114 165.7185 235.0647 100 a

mrgsim 157.6986 160.1458 167.7716 162.1961 166.1342 284.0970 100 a

qsim2 156.0837 158.5120 164.5692 160.4949 162.9437 337.5806 100 a

Steve

kylebaron commented 5 months ago

@stevechoy - right; the performance gain will definitely start going away when you have lots of subjects; also if you have difficult ODEs; mrgsim_q() is mainly for very fast turnaround problem ... it cuts the overhead of the run which seems big when the problem is very small and fast; but the overhead is almost nothing when the problem is larger.

You won't see the addl doses in the output records b/c they aren't in the input data. You do see the dose at time=0 because that is in the input data. The behavior is similar to NONMEM in that respect. If you do want to see the additional doses, use realize_addl() to put those doses explicitly in the data set.

I can look at your code and see if anything we can do to improve.

library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(microbenchmark)

input_model_object <- mread("pk1cmt", modlib())
#> Building pk1cmt ...
#> done.
dosing_scheme <- ev(amt = 50,
                    time = 0,
                    cmt = 1,
                    tinf = 1,
                    total = 28,
                    ii = 12
) %>% realize_addl()
sampling_times <- c(1,3,6,12,24)

sim_output <- mrgsim_q(input_model_object,
                       data = dosing_scheme,
                       stime = sampling_times,
                       output = "df")

sim_output
#>    ID time          EV1     CENT EV2        CP
#> 1   1    0 0.000000e+00  0.00000   0 0.0000000
#> 2   1    1 3.160603e+01 18.06794   0 0.9033972
#> 3   1    3 4.277411e+00 41.94951   0 2.0974753
#> 4   1    6 2.129598e-01 39.75748   0 1.9878738
#> 5   1   12 5.278739e-04 29.61858   0 1.4809288
#> 6   1   12 5.278739e-04 29.61858   0 1.4809288
#> 7   1   24 5.278773e-04 45.87390   0 2.2936949
#> 8   1   24 5.278773e-04 45.87390   0 2.2936949
#> 9   1   36 5.278773e-04 54.79501   0 2.7397505
#> 10  1   48 5.278773e-04 59.69102   0 2.9845509
#> 11  1   60 5.278773e-04 62.37801   0 3.1189003
#> 12  1   72 5.278773e-04 63.85266   0 3.1926328
#> 13  1   84 5.278773e-04 64.66196   0 3.2330980
#> 14  1   96 5.278773e-04 65.10612   0 3.2553058
#> 15  1  108 5.278773e-04 65.34987   0 3.2674937
#> 16  1  120 5.278773e-04 65.48365   0 3.2741826
#> 17  1  132 5.278773e-04 65.55707   0 3.2778535
#> 18  1  144 5.278773e-04 65.59736   0 3.2798682
#> 19  1  156 5.278773e-04 65.61948   0 3.2809738
#> 20  1  168 5.278773e-04 65.63161   0 3.2815806
#> 21  1  180 5.278773e-04 65.63827   0 3.2819136
#> 22  1  192 5.278773e-04 65.64193   0 3.2820964
#> 23  1  204 5.278773e-04 65.64393   0 3.2821967
#> 24  1  216 5.278773e-04 65.64503   0 3.2822517
#> 25  1  228 5.278773e-04 65.64564   0 3.2822820
#> 26  1  240 5.278773e-04 65.64597   0 3.2822985
#> 27  1  252 5.278773e-04 65.64615   0 3.2823076
#> 28  1  264 5.278773e-04 65.64625   0 3.2823126
#> 29  1  276 5.278773e-04 65.64631   0 3.2823154
#> 30  1  288 5.278773e-04 65.64634   0 3.2823169
#> 31  1  300 5.278773e-04 65.64635   0 3.2823177
#> 32  1  312 5.278773e-04 65.64636   0 3.2823182
#> 33  1  324 5.278773e-04 65.64637   0 3.2823184

Created on 2024-06-27 with reprex v2.0.2

kylebaron commented 5 months ago

(if there's a built-in argument to round the decimal places it would also be welcome

There's a digits attribute to the model object; if it is positive, it will limit the number of significant digits in the simulated output

kylebaron commented 5 months ago

If you're running with ODE, you can try to increase rtol. Or, switch to closed form solution (I know you can't do this with most models).

What't the issue with running in parallel in Shiny? You should be able to do it. Are you building the model locally?

mod <- mread("mymodel.mod", soloc = "build")
library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(microbenchmark)

input_model_object <- mread("pk1cmt", modlib())
#> Building pk1cmt ...
#> done.

input_model_object_closed <- modlib("pk1")
#> Building pk1 ... done.

dosing_scheme <- ev(amt = 50,
                    time = 0,
                    cmt = 1,
                    tinf = 1,
                    total = 28,
                    ii = 12
)

ext_db <- tibble(ID = 1:100,
                 WT = rnorm(100, mean = 70, sd = 5)
)

ev_df <- dosing_scheme %>%
  mrgsolve::ev_rep((1:nrow(ext_db)))

sampling_times <- seq(1, 336, by = 1)

sim_output <- mrgsim_q(input_model_object,
                       data = ev_df,
                       stime = sampling_times,
                       output = "df")

input_model_object2 <- update(
  input_model_object, 
  rtol = 1e-5
)

microbenchmark(
  mrgsim_q = mrgsim_q(input_model_object,
                      data = ev_df,
                      stime = sampling_times,
                      output = "df"),
  rtol = mrgsim_q(input_model_object2,
                      data = ev_df,
                      stime = sampling_times,
                      output = "df"),
  closed = mrgsim_q(input_model_object_closed,
                      data = ev_df,
                      stime = sampling_times,
                      output = "df"),
  times = 100
)
#> Warning in microbenchmark(mrgsim_q = mrgsim_q(input_model_object, data = ev_df,
#> : less accurate nanosecond times to avoid potential integer overflows
#> Unit: milliseconds
#>      expr       min        lq      mean    median        uq       max neval
#>  mrgsim_q 62.455013 63.165256 64.179098 63.848275 65.220648 66.890557   100
#>      rtol 43.700219 44.060671 44.894094 44.629812 45.697390 48.900167   100
#>    closed  4.291716  4.497393  4.881124  4.604033  4.797246  9.054235   100

Created on 2024-06-27 with reprex v2.0.2

kylebaron commented 4 months ago

@stevechoy - just following up on this issue. Anything else to do here? or can we close?

stevechoy commented 4 months ago

@stevechoy - just following up on this issue. Anything else to do here? or can we close?

Sounds good, we can close. Thanks for following up Kyle.