kaskr / adcomp

AD computation with Template Model Builder (TMB)
Other
176 stars 80 forks source link

Examples core dump using Intel compilers #304

Closed jkbest2 closed 4 years ago

jkbest2 commented 4 years ago

Description:

This may be a compiler compatibility issue, or just me naively setting up my environment. I do not see these problems using clang++ or g++ on my laptop (Ubuntu, using OpenBLAS).

I'm trying to fit models on our cluster. I compiled R using the instructions from Intel. When installing packages I made sure that the same compilers were available and environmental variables were set.

Reproducible Steps:

Compile R according to instructions linked above. Install TMB.

library(TMB)
runExample("simple")

Of the four included examples, "ar1xar1", "simple", and "randomregression" abort due to out-of-bounds errors. The "linreg_parallel" model seems to run successfully.

Current Output:

> TMB::runExample("ar1xar1")
Building example ar1xar1 
icpc -std=gnu++11 -I"/usr/lusers/jkbest/jkbest/binsrc/R-3.6.2/include" -DNDEBUG -I/gscratch/stf/jkbest/binsrc/R-3.6.2/library/TMB/include -I/gscratch/stf/jkbest/binsrc/R-3.6.2/library/RcppEigen/include  -DTMB_SAFEBOUNDS -DLIB_UNLOAD=R_unload_ar1xar1  -DTMB_LIB_INIT=R_init_ar1xar1   -I/usr/local/include  -fpic  -fPIC -qopenmp -O3 -ipo -xHost -multiple-processes=8  -c ar1xar1.cpp -o ar1xar1.o
icpc -std=gnu++11 -shared -L/usr/lusers/jkbest/jkbest/binsrc/R-3.6.2/lib -qopenmp -o ar1xar1.so ar1xar1.o -L/usr/lusers/jkbest/jkbest/binsrc/R-3.6.2/lib -lR
Build time 50.356 seconds

Running example ar1xar1 

Is it possible that some of the compilation flags are causing problems?

...Intermediate echoes removed...

> obj <- MakeADFun(data=list(N=N),
+                  parameters=list(
+                    eta=matrix(0,n,n),
+                    transf_phi1=invf(0 .... [TRUNCATED] 
0 regions found.
Using 1 threads
TMB has received an error from Eigen. The following condition was not met:
index >= 0 && index < size()
Please check your matrix-vector bounds etc., or run your program through a debugger.
[1]    12948 abort (core dumped)  R

In a larger model I'm working on, MakeADFun fails with a segfault, stating that the memory is not mapped.

Expected Output:

All examples should run.

TMB Version:

I have tried this with both the latest CRAN version and with master (including install-metis-full).

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /gscratch/sw/intel-2019/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TMB_1.7.15

loaded via a namespace (and not attached):
[1] compiler_3.6.2  Matrix_1.2-18   grid_3.6.2      lattice_0.20-38
kaskr commented 4 years ago

Essentially same problem as reported here:

https://github.com/glmmTMB/glmmTMB/issues/369 https://github.com/glmmTMB/glmmTMB/issues/232

In short, the openmp flag should only be set on request.

jkbest2 commented 4 years ago

Thank you! I hadn't seen those glmmTMB issues.