kaskr / adcomp

AD computation with Template Model Builder (TMB)
Other
176 stars 80 forks source link

Printing from parallel region #310

Closed boennecd closed 4 years ago

boennecd commented 4 years ago

These lines prints output from a parallel region

https://github.com/kaskr/adcomp/blob/021e4f4eddbaae1d1fe72fd36c37014103d4faac/TMB/inst/include/tmb_core.hpp#L132-L147

The code is hard to reach but causes a crash with the following example (I think):

# batch-file.R
library(TMB)
setwd(tempdir())
tmp_lib <- "tmpEx"
tmp_file <- paste0(tmp_lib, ".cpp")
sink(tmp_file)
cat("#include <TMB.hpp>
template<class Type>
Type objective_function<Type>::operator() ()
{
  DATA_VECTOR(Y);
  DATA_VECTOR(x);
  PARAMETER(a);
  PARAMETER(b);
  PARAMETER(logSigma);
  parallel_accumulator<Type> nll(this);
  for(int i=0; i<x.size(); i++)
    nll -= dnorm(Y[i], a + b * x[i], exp(logSigma), true);
  return nll;
}
")
sink()
openmp(4L)
compile(tmp_file)

y <- rnorm(100)
x <- 1:100
dyn.load(dynlib(tmp_lib))
config(tape.parallel = 1, trace.optimize = 1, optimize.parallel = 1, 
       DLL = tmp_lib)

fun <- MakeADFun(
  list(Y = y, x = x), parameters = list(a = 1, b = 1, logSigma = 0),
  random = "a",
  DLL = tmp_lib)
for(i in 1:100){
  fun$retape()
  fun$env$spHess(c(1, fun$par), c(a = 1))
}

Running path/to/batch-file.R gives

[1] 4
Note: Using Makevars in path/to/.R/Makevars 
g++-8 -I"/usr/share/R/include" #...
[1] 0
$trace.atomic
[1] 1

$trace.optimize
[1] 1

$trace.parallel
[1] 1

$tape.parallel
[1] 1

$optimize.parallel
[1] 1

$optimize.instantly
[1] 1

$debug.getListElement
[1] 0

# ...
Done
Hessian number of non-zeros:
nnz = 1
nnz = 1
nnz = 1
nnz = 1
4 regions found.
Using 4 threads
4 regions found.
Using 4 threads
Count num parallel regions
4 regions found.
Using 4 threads
Optimizing tape... Optimizing tape... Optimizing tape... Error: C stack usage  823358014716 is too close to the limit
Execution halted

 *** caught segfault ***
Done
Done
address 0x7f3eb98a447a, cause 'invalid permissions'
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault (core dumped)

A similar output is produced by

Rcpp::sourceCpp(code = "
    // [[Rcpp::plugins(openmp)]]
    #include <omp.h>
    #include <Rcpp.h>
    #include <string>

    void say_hello(){
      static unsigned time(0L);
#pragma omp critical
      Rcpp::Rcout << \"Hello! \" << std::to_string(time++) << \"\\n\";
    }

    // [[Rcpp::export]]
    double hello_world(){
      omp_set_num_threads(3);
      double x(0);

#pragma omp parallel for reduction(+:x)
      for(unsigned i = 0; i < 100L; ++i){
        say_hello();
        x += i;
      }

      return x;
    }")

hello_world()

which produces e.g.

# ...
Hello! 29
Hello! 30
Hello! 31
Hello! 32
Hello! Error: C stack usage  775819195420 is too close to the limit
Execution halted

Replacing critical with master seems to solve the issue although I am not sure whether this is recommended. However, only the master thread will execute the std::cout line so the final result is

# ...
Hello! 29
Hello! 30
Hello! 31
Hello! 32
Hello! 33
[1] 4950
kaskr commented 4 years ago

I think you've come across a difference between the github and the CRAN version of TMB. The github version prints using std::cout which is thread safe. Due to a CRAN policy, the CRAN version is patched to print using Rcpp::Rcout which is not thread safe.

If you can live without the parallelization diagnostic messages I guess, if I recall correctly, that trace.parallel=0 should solve the problem?

boennecd commented 4 years ago

I think you've come across a difference between the github and the CRAN version of TMB. The github version prints using std::cout which is thread safe. Due to a CRAN policy, the CRAN version is patched to print using Rcpp::Rcout which is not thread safe.

I see. I did not notice that there were a difference: https://github.com/kaskr/adcomp/blob/52002f8dd4e022661b1f4f32986a0c40a19fbd75/Makefile#L96-L115

If you can live without the parallelization diagnostic messages I guess, if I recall correctly, that trace.parallel=0 should solve the problem?

Yes that solves the issue. I have no opinion. I wanted to make you aware of a potential issue but you already address it.