metrumresearchgroup / mrgsolve

Simulate from ODE-based population PK/PD and QSP models in R
https://mrgsolve.org
GNU General Public License v2.0
131 stars 36 forks source link

Different additive errors after consecutive simulations despite setting seed #1017

Closed wonmy076 closed 1 year ago

wonmy076 commented 2 years ago

Hello, I am running consecutive Monte Carlo simulations for the same set of individuals. After each simulation, the ETAs (PPVs) stay the same for each individual, but the additive error changes. As a result, the DVs are different and the IPREDs are the same. This is despite me setting the same seed before each time I run the simulation and in my population code.

This is an example of the outputs I get after simulation 1 and 2.

Simulation 1: image

Simulation 2: image

Am a bit stumped as to why everything else stays the same but the additive error changes... and despite setting the seed. Any help would be greatly appreciated! Thank you.

FelicienLL commented 2 years ago

Hello @wonmy076, Yes it is weird. I also see that ERRPROP does not change either, and IPRE is null. Can I ask you to post a bit of code so that I can reproduce this please, and see what is wrong? Ideally in a form of a reprex like this:

library(mrgsolve)
#> 
#> Attachement du package : 'mrgsolve'
#> L'objet suivant est masqué depuis 'package:stats':
#> 
#>     filter
cod <- "
$OMEGA 0.1 0.1
$SIGMA 0.02 0.1
$CMT CENT 
$MAIN
double ETACL = ETA(1) ;
double ETAV = ETA(2) ;
double CL = 1 * exp(ETACL) ; 
double V = 30 * exp(ETAV) ;
$PKMODEL ncmt = 1, depot = FALSE
$TABLE
double ERRPROP = EPS(1) ;
double ERRADD = EPS(2) ;
double IPRED = CENT / V ;
double DV = IPRED * (1 + ERRPROP) + ERRADD ;
$CAPTURE
ETACL ETAV ERRPROP ERRADD IPRED DV
"
mod <- mcode("mod", cod)
#> Building mod ...
#> done.

set.seed(1234)
mrgsim_df(mod, tgrid = 0)
#>   ID time CENT      ETACL       ETAV   ERRPROP    ERRADD IPRED        DV
#> 1  1    0    0 -0.2764698 0.08752883 0.1975575 0.4987714     0 0.4987714
set.seed(1234)
mrgsim_df(mod, tgrid = 0)
#>   ID time CENT      ETACL       ETAV   ERRPROP    ERRADD IPRED        DV
#> 1  1    0    0 -0.2764698 0.08752883 0.1975575 0.4987714     0 0.4987714

Created on 2022-08-31 by the reprex package (v2.0.1)

Félicien

wonmy076 commented 2 years ago
library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter
code <- '

$INIT    
CMT1       = 0,       
CMT2       = 0,        
AUC        = 0,        

$SET      
atol       = 1e-8, rtol = 1e-8
maxsteps   = 100000

$PARAM   
POPCL      = exp(1.67),       
POPV1      = exp(3.75),        
POPV2      = exp(3.73),        
POPQ       = exp(1.17),       
MAT50_CL   = exp(-0.113),    
GAMMA1     = exp(1.06),    
DEC50_CL   = exp(4.12),       
GAMMA2     = -exp(0.806),    
FSCR       = exp(-0.436),    
CANCER_CL  = 0.293,          
HEEL_V1    = 0.313,          
HEEL_Q     = 0.595,          

// Default Covariate values for Simulation
AGE           = 35,          
GA            = 40,           
WT            = 70,          
SCR           = 0.84,         
HEEL          = 0,            
CANCER        = 0,            
INFDUR        = 1,            

$OMEGA    
@labels PPVCL, PPVV1, PPVV2

0.0749                   // PPV on CL
0.0721                   // PPV on V1
0.672                    // PPV on V2

$SIGMA   
@labels ERRPROP, ERRADD
0.046                  
1.52                   

$MAIN     

double PMA        =  AGE + GA/52;                        
double PMA_REF    =  35 + (40.0/52.0);                   
double WT_NORM    =  WT/70;                             

double FMAT       =  pow(PMA,GAMMA1)/(pow(PMA,GAMMA1)+pow(MAT50_CL,GAMMA1));         
double FMAT_REF   =  pow(PMA_REF,GAMMA1)/(pow(PMA_REF,GAMMA1)+pow(MAT50_CL,GAMMA1)); 
double MAT_CL     = FMAT/FMAT_REF;                                      

double FDEC       =  pow(PMA,GAMMA2)/(pow(PMA,GAMMA2)+pow(DEC50_CL,GAMMA2));        
double FDEC_REF   =  pow(PMA_REF,GAMMA2)/(pow(PMA_REF,GAMMA2)+pow(DEC50_CL,GAMMA2)); 
double DEC_CL     =  FDEC/FDEC_REF;                                                 
double SCR_REF    =  exp(-1.22839 + log10(PMA)*0.67190 + 6.27017 *exp(-3.10940*PMA)); 
double SCR_CL     =  exp(-FSCR*(SCR - SCR_REF));                                     
double CL         =  POPCL * pow((V1/POPV1),0.75) * MAT_CL * DEC_CL * SCR_CL * (1+CANCER*CANCER_CL) * exp(PPVCL);
double V1         =  POPV1 * WT_NORM * (1-HEEL*HEEL_V1) * exp(PPVV1);
double V2         =  POPV2 * WT_NORM * exp(PPVV2);
double Q          =  POPQ * pow((V2/POPV2),0.75) * (1-HEEL*HEEL_Q);

D_CMT1            = INFDUR;

$ODE      
double C1 = CMT1/V1;      
double C2 = CMT2/V2;     

dxdt_CMT1 = - C1*Q + C2*Q - C1*CL;    
dxdt_CMT2 =   C1*Q - C2*Q;             
dxdt_AUC  =   C1;                      

$TABLE    
double IPRE = C1;                            
double DV   = IPRE + IPRE*ERRPROP + ERRADD;  

$CAPTURE
CL V1 V2 Q PPVCL PPVV1 PPVV2 ERRPROP ERRADD IPRE DV WT AGE GA PMA SCR HEEL CANCER INFDUR
'
# compile the model code
mod <- mcode("PopPKModel",code)
#> Building PopPKModel ...
#> done.

set.seed(123456)
mrgsim_df(mod, tgrid = 0)
#>   ID time CMT1 CMT2 AUC CL       V1       V2        Q     PPVCL     PPVV1
#> 1  1    0    0    0   0  0 52.07243 16.82352 1.631636 0.1758639 0.2026356
#>        PPVV2    ERRPROP    ERRADD IPRE        DV WT AGE GA      PMA  SCR HEEL
#> 1 -0.9072221 -0.1146646 -1.433469    0 -1.433469 70  35 40 35.76923 0.84    0
#>   CANCER INFDUR
#> 1      0      1

set.seed(123456)
mrgsim_df(mod, tgrid = 0)
#>   ID time CMT1 CMT2 AUC       CL       V1       V2        Q     PPVCL     PPVV1
#> 1  1    0    0    0   0 7.332455 52.07243 16.82352 1.631636 0.1758639 0.2026356
#>        PPVV2    ERRPROP    ERRADD IPRE        DV WT AGE GA      PMA  SCR HEEL
#> 1 -0.9072221 -0.1146646 -1.433469    0 -1.433469 70  35 40 35.76923 0.84    0
#>   CANCER INFDUR
#> 1      0      1

Created on 2022-08-31 with reprex v2.0.2

This is my first time using replex so hope this works for you. Thanks

FelicienLL commented 2 years ago

Thanks, I can reproduce that: if set.seed() and mrgsim() are called right after model compiling, a clearance of zero is returned. In every other situations, the right values are returned. I don't know if this has the same source of error as in your first post (where ERRADD was changed, not CL). It is not for every model: my previous reprex example did not show this behaviour.

mod <- mcode("PopPKModel",code)
#> Building PopPKModel ...
#> done.
set.seed(123456)
mrgsim_df(mod, tgrid = 0, Req = "CL, PPVCL, ERRPROP, ERRADD")
#>   ID time CL     PPVCL    ERRPROP    ERRADD
#> 1  1    0  0 0.1758639 -0.1146646 -1.433469
set.seed(123456)
mrgsim_df(mod, tgrid = 0, Req = "CL, PPVCL, ERRPROP, ERRADD")
#>   ID time       CL     PPVCL    ERRPROP    ERRADD
#> 1  1    0 7.332455 0.1758639 -0.1146646 -1.433469
set.seed(123456)
mrgsim_df(mod, tgrid = 0, Req = "CL, PPVCL, ERRPROP, ERRADD")
#>   ID time       CL     PPVCL    ERRPROP    ERRADD
#> 1  1    0 7.332455 0.1758639 -0.1146646 -1.433469
mod <- mcode("PopPKModel",code)
#> Building PopPKModel ...
#> (waiting) ...
#> done.
set.seed(123456)
mrgsim_df(mod, tgrid = 0, Req = "CL, PPVCL, ERRPROP, ERRADD")
#>   ID time CL     PPVCL    ERRPROP    ERRADD
#> 1  1    0  0 0.1758639 -0.1146646 -1.433469
set.seed(123456)
mrgsim_df(mod, tgrid = 0, Req = "CL, PPVCL, ERRPROP, ERRADD")
#>   ID time       CL     PPVCL    ERRPROP    ERRADD
#> 1  1    0 7.332455 0.1758639 -0.1146646 -1.433469

Created on 2022-08-31 by the reprex package (v2.0.1)

For now, I cannot say what is wrong there.

kylebaron commented 2 years ago

@wonmy076 -

Can you check these lines?

double CL         =  POPCL * pow((V1/POPV1),0.75) * MAT_CL * DEC_CL * SCR_CL * (1+CANCER*CANCER_CL) * exp(PPVCL);
double V1         =  POPV1 * WT_NORM * (1-HEEL*HEEL_V1) * exp(PPVV1);

If this is what you intended, then you'll have to define V1 before CL.

wonmy076 commented 2 years ago

@wonmy076 -

Can you check these lines?

double CL         =  POPCL * pow((V1/POPV1),0.75) * MAT_CL * DEC_CL * SCR_CL * (1+CANCER*CANCER_CL) * exp(PPVCL);
double V1         =  POPV1 * WT_NORM * (1-HEEL*HEEL_V1) * exp(PPVV1);

If this is what you intended, then you'll have to define V1 before CL.

Thank you. That has fixed the issue of CL being 0 in the first simulation. But it hasn't fixed the issue of additive error being different in each output after each simulation, and DVs being different.

kylebaron commented 2 years ago

Ok; could you please update your example?

kylebaron commented 2 years ago

I'm getting this on 1.0.4; also reproducible with 1.0.6 (recently uploaded to CRAN).

library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter

code <- '

$INIT    
CMT1       = 0,       
CMT2       = 0,        
AUC        = 0,        

$SET      
atol       = 1e-8, rtol = 1e-8
maxsteps   = 100000

$PARAM   
POPCL      = exp(1.67),       
POPV1      = exp(3.75),        
POPV2      = exp(3.73),        
POPQ       = exp(1.17),       
MAT50_CL   = exp(-0.113),    
GAMMA1     = exp(1.06),    
DEC50_CL   = exp(4.12),       
GAMMA2     = -exp(0.806),    
FSCR       = exp(-0.436),    
CANCER_CL  = 0.293,          
HEEL_V1    = 0.313,          
HEEL_Q     = 0.595,          

// Default Covariate values for Simulation
AGE           = 35,          
GA            = 40,           
WT            = 70,          
SCR           = 0.84,         
HEEL          = 0,            
CANCER        = 0,            
INFDUR        = 1,            

$OMEGA    
@labels PPVCL, PPVV1, PPVV2

0.0749                   // PPV on CL
0.0721                   // PPV on V1
0.672                    // PPV on V2

$SIGMA   
@labels ERRPROP, ERRADD
0.046                  
1.52                   

$MAIN     

double PMA        =  AGE + GA/52;                        
double PMA_REF    =  35 + (40.0/52.0);                   
double WT_NORM    =  WT/70;                             

double FMAT       =  pow(PMA,GAMMA1)/(pow(PMA,GAMMA1)+pow(MAT50_CL,GAMMA1));         
double FMAT_REF   =  pow(PMA_REF,GAMMA1)/(pow(PMA_REF,GAMMA1)+pow(MAT50_CL,GAMMA1)); 
double MAT_CL     = FMAT/FMAT_REF;                                      

double FDEC       =  pow(PMA,GAMMA2)/(pow(PMA,GAMMA2)+pow(DEC50_CL,GAMMA2));        
double FDEC_REF   =  pow(PMA_REF,GAMMA2)/(pow(PMA_REF,GAMMA2)+pow(DEC50_CL,GAMMA2)); 
double DEC_CL     =  FDEC/FDEC_REF;                                                 
double SCR_REF    =  exp(-1.22839 + log10(PMA)*0.67190 + 6.27017 *exp(-3.10940*PMA)); 
double SCR_CL     =  exp(-FSCR*(SCR - SCR_REF));                                     

double V1         =  POPV1 * WT_NORM * (1-HEEL*HEEL_V1) * exp(PPVV1);
double V2         =  POPV2 * WT_NORM * exp(PPVV2);
double Q          =  POPQ * pow((V2/POPV2),0.75) * (1-HEEL*HEEL_Q);
double CL         =  POPCL * pow((V1/POPV1),0.75) * MAT_CL * DEC_CL * SCR_CL * (1+CANCER*CANCER_CL) * exp(PPVCL);

D_CMT1            = INFDUR;

$ODE      
double C1 = CMT1/V1;      
double C2 = CMT2/V2;     

dxdt_CMT1 = - C1*Q + C2*Q - C1*CL;    
dxdt_CMT2 =   C1*Q - C2*Q;             
dxdt_AUC  =   C1;                      

$TABLE    
double IPRE = C1;                            
double DV   = IPRE + IPRE*ERRPROP + ERRADD;  

$CAPTURE
CL V1 V2 Q PPVCL PPVV1 PPVV2 ERRPROP ERRADD IPRE DV WT AGE GA PMA SCR HEEL CANCER INFDUR
'

mod <- mcode("PopPKModel",code)
#> Building PopPKModel ...
#> done.

set.seed(123456)
a <- mrgsim_df(mod, tgrid = 0)
a
#>   ID time CMT1 CMT2 AUC       CL       V1       V2        Q     PPVCL     PPVV1
#> 1  1    0    0    0   0 7.332455 52.07243 16.82352 1.631636 0.1758639 0.2026356
#>        PPVV2    ERRPROP    ERRADD IPRE        DV WT AGE GA      PMA  SCR HEEL
#> 1 -0.9072221 -0.1146646 -1.433469    0 -1.433469 70  35 40 35.76923 0.84    0
#>   CANCER INFDUR
#> 1      0      1

set.seed(123456)
b <- mrgsim_df(mod, tgrid = 0)
b
#>   ID time CMT1 CMT2 AUC       CL       V1       V2        Q     PPVCL     PPVV1
#> 1  1    0    0    0   0 7.332455 52.07243 16.82352 1.631636 0.1758639 0.2026356
#>        PPVV2    ERRPROP    ERRADD IPRE        DV WT AGE GA      PMA  SCR HEEL
#> 1 -0.9072221 -0.1146646 -1.433469    0 -1.433469 70  35 40 35.76923 0.84    0
#>   CANCER INFDUR
#> 1      0      1

a$DV
#> [1] -1.433469
b$DV
#> [1] -1.433469

set.seed(1234567)
c <- mrgsim_df(mod, tgrid = 0)
set.seed(1234567)
d <- mrgsim_df(mod, tgrid = 0)

c$DV
#> [1] 1.451737
d$DV
#> [1] 1.451737

identical(a,b)
#> [1] TRUE
identical(c,d)
#> [1] TRUE

Created on 2022-09-01 by the reprex package (v2.0.1)

wonmy076 commented 2 years ago

@kylebaron

Could it be because my input_df2 for simulation 2 has a different number of rows than input_df1 for simulation 1? This is because there are more doses being given in the input_df2 than df1.

kylebaron commented 2 years ago

Hi @wonmy076 -

I didn't see input_df2 or input_df1 in the example you gave. That is something we would have checked if it was there. I assumed that the example you gave reproduced the issue you were seeing.

If you want to go ahead and post an example that actually reproduces the issue you are seeing, that would be the next step. But as you noted, the simulation is only reproducible for identical inputs.

Kyle

rkern commented 2 years ago

Here is an example where I get reproducibility (and correctness) issues. When the total number of output time samples (across all patients) is greater than about 32000 (which makes me suspicious that it's actually 32768 = 2**15), the PROP errors are unreproducible and do not have the requested variance. When the number of patients or time sampling is adjusted to reduce that total number to below that threshold, I usually get reproducible results. That is, when the number of samples is just under the threshold, most runs will reproduce the same correct-variance results in detail, but sometimes the repeated runs will have higher variances and they will be different each time.

library(mrgsolve)
library(ggplot2)
library(dplyr)

set.seed(12345)

# Try 15 to get the expected, reproducible results back
# Try 16 for the semi-reproducible results
n_patients <- 100
# Try 12 to get the expected, reproducible results back
# Try 13 for the semi-reproducible results
n_days <- 85
# Try 8 to get the expected, reproducible results back
delta <- 1

# Super-simple model
code <- ' 
$PARAM CL = 1, V = 25

$PKMODEL cmt = "CENT"

$SIGMA @labels PROP
0.025

$TABLE
capture IPRED = CENT/V;
capture DV = IPRED*(1+PROP);
capture PROP = PROP;
'

mod <- mcode("sigma-prop", code)

data <- expand.ev(ID=1:n_patients, amt=50, ii=24, addl=85)
idata <- 
  data.frame(ID=1:n_patients)

out<-mod %>% 
  data_set(data) %>%
  idata_set(idata) %>% 
  obsonly() %>% 
  mrgsim(start=0,end=24*n_days,delta=delta) %>% 
  as.data.frame()

length(out$PROP)
var(out$PROP)
expected_var <- as.numeric(smat(mod)@data)

gbprop <- out %>% group_by(ID) %>% summarise(varPROP=var(PROP))
g<-ggplot(data=gbprop)+
  geom_line(aes(ID, varPROP))+
  geom_hline(yintercept=expected_var);g
kylebaron commented 2 years ago

Hi @rkern -

Is the code you posted producing the expected output? or not producing expected output?

I'm getting this from your code

library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

set.seed(12345)

# Try 15 to get the expected, reproducible results back
# Try 16 for the semi-reproducible results
n_patients <- 100
# Try 12 to get the expected, reproducible results back
# Try 13 for the semi-reproducible results
n_days <- 85
# Try 8 to get the expected, reproducible results back
delta <- 1

# Super-simple model
code <- ' 
$PARAM CL = 1, V = 25

$PKMODEL cmt = "CENT"

$SIGMA @labels PROP
0.025

$TABLE
capture IPRED = CENT/V;
capture DV = IPRED*(1+PROP);
capture PROP = PROP;
'

mod <- mcode("sigma-prop", code)
#> Building sigma-prop ...
#> done.

data <- expand.ev(ID=1:n_patients, amt=50, ii=24, addl=85)
idata <- 
  data.frame(ID=1:n_patients)

set.seed(12345)
out1 <-mod %>% 
  data_set(data) %>%
  idata_set(idata) %>% 
  obsonly() %>% 
  mrgsim(start=0,end=24*n_days,delta=delta) %>% 
  as.data.frame()

set.seed(12345)
out2 <-mod %>% 
  data_set(data) %>%
  idata_set(idata) %>% 
  obsonly() %>% 
  mrgsim(start=0,end=24*n_days,delta=delta) %>% 
  as.data.frame()

identical(out1, out2)
#> [1] TRUE

length(out1$PROP)
#> [1] 204100
var(out1$PROP)
#> [1] 0.02498433
var(out2$PROP)
#> [1] 0.02498433
expected_var <- as.numeric(smat(mod)@data)

Try MASS::mvrnorm

set.seed(3392)
mass <- MASS::mvrnorm(nrow(out1), 0, as.matrix(smat(mod))) %>% as.numeric()
length(mass)
#> [1] 204100
var(mass)
#> [1] 0.02487356
var(out1$PROP)
#> [1] 0.02498433
smat(mod)@data
#> $...
#>       [,1]
#> [1,] 0.025

Created on 2022-10-12 with reprex v2.0.2

rkern commented 2 years ago

My script returns the unexpected results for me (R 4.1.2, mrgsolve 1.0.6).

> var(out$PROP)
[1] 2.12869
rkern commented 2 years ago

Ah, MASS:mvrnorm() is giving the wrong results. I have MASS 7.3-55

> set.seed(3392)
> mass <- MASS::mvrnorm(nrow(out), 0, as.matrix(smat(mod))) %>% as.numeric()
> length(mass)
[1] 204100
> var(mass)
[1] 2.470504
> var(out$PROP)
[1] 2.12869
> smat(mod)@data
$...
      [,1]
[1,] 0.025
kylebaron commented 2 years ago

Oh wow ... yeah, that's way different. The mrgsolve implementation for generating EPS follows what they do in MASS and looks like that's consistent with rnorm too on my setup.

Can you send your sessionInfo()? Can't imagine what is going on here.

Kyle

library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

set.seed(12345)

# Try 15 to get the expected, reproducible results back
# Try 16 for the semi-reproducible results
n_patients <- 100
# Try 12 to get the expected, reproducible results back
# Try 13 for the semi-reproducible results
n_days <- 85
# Try 8 to get the expected, reproducible results back
delta <- 1

# Super-simple model
code <- ' 
$PARAM CL = 1, V = 25

$PKMODEL cmt = "CENT"

$SIGMA @labels PROP
0.025

$TABLE
capture IPRED = CENT/V;
capture DV = IPRED*(1+PROP);
capture PROP = PROP;
'

mod <- mcode("sigma-prop", code)
#> Building sigma-prop ...
#> done.

data <- expand.ev(ID=1:n_patients, amt=50, ii=24, addl=85)
idata <- 
  data.frame(ID=1:n_patients)

set.seed(12345)
out1 <-mod %>% 
  data_set(data) %>%
  idata_set(idata) %>% 
  obsonly() %>% 
  mrgsim(start=0,end=24*n_days,delta=delta) %>% 
  as.data.frame()

set.seed(12345)
out2 <-mod %>% 
  data_set(data) %>%
  idata_set(idata) %>% 
  obsonly() %>% 
  mrgsim(start=0,end=24*n_days,delta=delta) %>% 
  as.data.frame()

identical(out1, out2)
#> [1] TRUE

length(out1$PROP)
#> [1] 204100
var(out1$PROP)
#> [1] 0.02498433
var(out2$PROP)
#> [1] 0.02498433
expected_var <- as.numeric(smat(mod)@data)

Try MASS::mvrnorm

set.seed(3392)
mass <- MASS::mvrnorm(nrow(out1), 0, as.matrix(smat(mod))) %>% as.numeric()

length(mass)
#> [1] 204100

var(mass)
#> [1] 0.02487356
var(out1$PROP)
#> [1] 0.02498433

var(rnorm(nrow(out1), 0, sqrt(as.numeric(as.matrix(smat(mod))))))
#> [1] 0.02499107

smat(mod)@data
#> $...
#>       [,1]
#> [1,] 0.025

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.0.9         ggplot2_3.3.6       mrgsolve_1.0.6.9000
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.9        pillar_1.8.1      compiler_4.1.2    highr_0.9        
#>  [5] R.methodsS3_1.8.2 R.utils_2.12.0    tools_4.1.2       digest_0.6.29    
#>  [9] gtable_0.3.0      evaluate_0.16     lifecycle_1.0.1   tibble_3.1.8     
#> [13] R.cache_0.16.0    pkgconfig_2.0.3   rlang_1.0.5       reprex_2.0.2     
#> [17] DBI_1.1.3         cli_3.3.0         rstudioapi_0.14   yaml_2.3.5       
#> [21] xfun_0.32         fastmap_1.1.0     withr_2.5.0       styler_1.7.0     
#> [25] stringr_1.4.1     knitr_1.40        generics_0.1.3    fs_1.5.2         
#> [29] vctrs_0.4.1       grid_4.1.2        tidyselect_1.1.2  glue_1.6.2       
#> [33] R6_2.5.1          fansi_1.0.3       rmarkdown_2.16    purrr_0.3.4      
#> [37] magrittr_2.0.3    MASS_7.3-58.1     scales_1.2.1      htmltools_0.5.3  
#> [41] assertthat_0.2.1  colorspace_2.0-3  utf8_1.2.2        stringi_1.7.8    
#> [45] munsell_0.5.0     R.oo_1.25.0

Created on 2022-10-12 with reprex v2.0.2

kylebaron commented 2 years ago

packageVersion("MASS") [1] ‘7.3.58.1’

rkern commented 2 years ago
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] MASS_7.3-55    reprex_2.0.2   dplyr_1.0.10   ggplot2_3.3.6  mrgsolve_1.0.6

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9        pillar_1.8.1      compiler_4.1.2    highr_0.9         R.utils_2.12.0    R.methodsS3_1.8.2 tools_4.1.2      
 [8] digest_0.6.29     R.cache_0.16.0    evaluate_0.17     lifecycle_1.0.3   tibble_3.1.8      gtable_0.3.1      pkgconfig_2.0.3  
[15] rlang_1.0.6       cli_3.4.1         rstudioapi_0.14   yaml_2.3.5        xfun_0.33         fastmap_1.1.0     styler_1.7.0     
[22] withr_2.5.0       knitr_1.40        generics_0.1.3    vctrs_0.4.2       fs_1.5.2          grid_4.1.2        getopt_1.20.3    
[29] tidyselect_1.2.0  glue_1.6.2        optparse_1.7.3    R6_2.5.1          processx_3.7.0    fansi_1.0.3       rmarkdown_2.17   
[36] purrr_0.3.5       callr_3.7.2       farver_2.1.1      clipr_0.8.0       magrittr_2.0.3    ps_1.7.1          scales_1.2.1     
[43] htmltools_0.5.3   colorspace_2.0-3  labeling_0.4.2    utf8_1.2.2        munsell_0.5.0     R.oo_1.25.0      

> packageVersion("MASS")
[1] ‘7.3.55’
rkern commented 2 years ago

Interestingly, there is a sharp cutoff at 40960 (10 * 2**12) where MASS:rvnorm() breaks down. 40959 is fine while 40960 is not.

> set.seed(3392)
> mass <- MASS::mvrnorm(40960, 0, as.matrix(0.025)) %>% as.numeric()
> length(mass)
[1] 40960
> var(mass)
[1] 2.430878
> set.seed(3392)
> mass <- MASS::mvrnorm(40959, 0, as.matrix(0.025)) %>% as.numeric()
> length(mass)
[1] 40959
> var(mass)
[1] 0.02495838
rkern commented 2 years ago

Here is my version of RcppArmadillo, FWIW. That might be the difference between our installations.

> packageVersion("RcppArmadillo")
[1] ‘0.11.4.0.1’
kylebaron commented 2 years ago

Yeah ... weird. I know MASS::mvrnorm() doesn't rely on RcppArmadillo at all but still giving weird results.

On my Ubuntu instance, no problems with 40960 either

> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

> set.seed(3392)
> mass <- MASS::mvrnorm(40960, 0, as.matrix(0.025)) %>% as.numeric()
> length(mass)
[1] 40960
> var(mass)
[1] 0.02495818
> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Even larger numbers work

> mass <- MASS::mvrnorm(4096000, 0, as.matrix(0.025)) %>% as.numeric()
> length(mass)
[1] 4096000
> var(mass)
[1] 0.02499063
rkern commented 2 years ago

Hmmm, something is going wrong for me in the matrix multiplication in MASS::mvrnorm() that scales the standard-normal vector. I'm not that familiar with the R ecosystem: would the implementation of %*% be related at all to your use of Armadillo's matrix multiplication here?

This does seem like something that is related more to my installation, somehow, than a bug in mrgsolve per se. I'm using the standard R installation from Ubuntu 22.04's apt repository. I'll try the latest release from R's apt repo soon. If anything about my description of the issue makes any lightbulbs go off, I'd be happy to hear it, but that would be above and beyond.

set.seed(12345)

n <- 40960
mu <- 0
Sigma <- as.matrix(0.025)
p <- length(mu)
eS <- eigen(Sigma, symmetric = TRUE)
ev <- eS$values
X <- matrix(rnorm(p * n), n)
scaler <- diag(sqrt(pmax(ev, 0)), p)
scaler
#>           [,1]
#> [1,] 0.1581139
temp <- scaler %*% t(X)
var(t(temp))
#>         [,1]
#> [1,] 2.40582
Y <- drop(mu) + eS$vectors %*% temp
nm <- names(mu)
if(is.null(nm) && !is.null(dn <- dimnames(Sigma))) nm <- dn[[1L]]
dimnames(Y) <- list(nm, NULL)
Z <- t(Y)
var(Z)
#>          [,1]
#> [1,] 233.4444

Created on 2022-10-12 with reprex v2.0.2

kylebaron commented 2 years ago

would the implementation of %*% be related at all to your use of Armadillo's matrix multiplication here?

It's possible that they both tap the same BLAS routines to do that; that's the only thing that I know of where they could possibly intersect. But I don't know enough of the Armadillo internals to know if they tap BLAS or not. Other than that, it's hard to imagine what those two could have in common; but they must be sharing something for both to be going haywire like this.

rkern commented 2 years ago

Hmm, I seem to have some global installation of the Intel MKL in /lib/x86_64-linux-gnu for some reason (some non-packaged software decided it needed to place those there, I am guessing). RcppArmadillo.so, at least, is linking against that as the "system LAPACK" that it found during the build process. Let me see if I can undo that.

❯ ldd ~/R/x86_64-pc-linux-gnu-library/4.1/RcppArmadillo/libs/RcppArmadillo.so | grep mkl
    libmkl_rt.so => /lib/x86_64-linux-gnu/libmkl_rt.so (0x00007f8ca6c00000)
rkern commented 2 years ago

Ah, no it's from the official Ubuntu Intel MKL packages. Removing the intel-mkl-full package (and all of its dependencies) seems to fix both mrgsolve and MASS:mvrnorm().

Thank you very much for your help. I wouldn't have been able to pin it down without the suggestion to look at MASS:mvrnorm(). I hope this information helps you with future reports about unreproducibility in the EPS noise.

kylebaron commented 2 years ago

Oh great. Thanks for reporting and I'm glad we got it figured out. Always an adventure working on different platforms!

Kyle