kaskr / adcomp

AD computation with Template Model Builder (TMB)
Other
176 stars 80 forks source link

Metis reorderings segfault with Matrix v1.3 #337

Closed jkbest2 closed 3 years ago

jkbest2 commented 3 years ago

Description:

This is likely an upstream issue with the Matrix package and/or Metis, but I thought I'd at least file it here in case anyone else runs into this issue. After my system install of the Matrix package upgraded to v1.3, runSymbolicAnalysis segfaults. I am using the focal-cran40 PPA for R and the recommended packages on Kubuntu 20.04. I have no flags set in ~/.R/Makevars.

Reproducible Steps:

  1. Install an updated version of the Matrix package from the focal-cran40 PPA: sudo apt install r-cran-matrix=1.3-2-1.2004.0 or sudo apt install r-cran-matrix=1.3-0-1.2004.0.
  2. Clone TMB and install with Metis: make install-metis-full
  3. In R, run an example that uses runSymbolicAnalysis: TMB::runExample("ar1xar1").

Current Output:

> library(TMB)
> runExample("ar1xar1")
Building example ar1xar1 
Note: Using Makevars in /home/jkbest/.R/Makevars 
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I"/home/jkbest/R/x86_64-pc-linux-gnu-library/4.0/TMB/include"   -DTMB_SAFEBOUNDS -DLIB_UNLOAD=R_unload_ar1xar1  -DTMB_LIB_INIT=R_init_ar1xar1      -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-8T8CYO/r-base-4.0.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c ar1xar1.cpp -o ar1xar1.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o ar1xar1.so ar1xar1.o -L/usr/lib/R/lib -lR
Build time 35.15 seconds

Running example ar1xar1 

> require(TMB)

> set.seed(123)

> n <- 50 ## Size of problem = n*n

> ## ======================= Simulate separable 2D GMRF 
> ## - With exponential correlation in both directions
> ## - phi1 = 1-lag correlation in 1st .... [TRUNCATED] 

> simgmrf <- function(n1,n2,phi1,phi2){
+   u <- matrix(rnorm(n1*n2),n1,n2)
+   L1 <- t(chol(ar1corr(n1,phi1)))
+   L2 <- t(chol(ar1corr(n2,phi2)))
+  .... [TRUNCATED] 

> ## ======================= Simulate data
> phi1=exp(-1/(.1*n)) ## Correlation range=10% of grid size first dimension

> phi2=exp(-1/(.2*n)) ## Correlation range=20% of grid size second dimension

> eta <- simgmrf(n,n,phi1,phi2)

> N <- rpois(length(eta),exp(eta))

> d <- expand.grid(x=factor(1:n),y=factor(1:n))

> d$N <- N

> ## ======================= Parameterization of phi
> f <- function(x) 2/(1 + exp(-2 * x)) - 1

> invf <- function(y) -0.5 * log(2/(y + 1) - 1)

> ## ======================= Fit model
> dyn.load(dynlib("ar1xar1"))

> obj <- MakeADFun(data=list(N=N),
+                  parameters=list(
+                    eta=matrix(0,n,n),
+                    transf_phi1=invf(0 .... [TRUNCATED] 
Constructing atomic D_lgamma
Constructing atomic ppois
Constructing atomic D_lgamma
Constructing atomic ppois

> runSymbolicAnalysis(obj)
Constructing atomic D_lgamma
Constructing atomic ppois
Optimizing tape... Done
Entering externallib 
cholmod_analyze: status=-4 
Chosen ordering -1 

 *** caught segfault ***
address 0x10, cause 'memory not mapped'

Traceback:
 1: runSymbolicAnalysis(obj)
 2: eval(ei, envir)
 3: eval(ei, envir)
 4: withVisible(eval(ei, envir))
 5: source(paste(name, ".R", sep = ""), echo = TRUE)
 6: FUN(X[[i]], ...)
 7: lapply(exnames, runExample, thisR = thisR, clean = clean, exfolder = exfolder,     dontrun = dontrun, subarch = subarch, ...)
 8: runExample(all = TRUE)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 1
R is aborting now ...
[1]    53243 segmentation fault (core dumped)  R --no-save --no-restore-data

Expected Output:

Successful optimization run including reorderings. I am able to run the ar1xar1 example after downgrading the Matrix package to v1.2 with sudo apt install r-cran-matrix=1.2-18-1focal0 and reinstall TMB.

TMB Version:

v1.7.18 (from current Github master branch)

R Version:

v4.0.3

Operating System:

Kubuntu 20.04

kaskr commented 3 years ago

Thanks for the report @jkbest2 . I can replicate the problem.