Rcpp implementations may not always be faster then pure-R

With simple functions like dlaplace the Rcpp implementation can be slighty slower then pure-R benchmarks:

library(rbenchmark)

dlaplaceR <- function(x, mu, sigma) {
  z <- (x-mu)/sigma
  1/(2*sigma) * exp(-abs(z))
}

dlaplaceRlog <- function(x, mu, sigma) {
  LOG_2F <- 0.6931471805599452862268
  z <- abs(x-mu)/sigma
  exp(-z - LOG_2F - log(sigma))
}

x <- seq(-100, 100, by = 0.001)

benchmark(
  dlaplace(x, 0, 1),
  dlaplaceR(x, 0, 1),
  dlaplaceRlog(x, 0, 1)
)
##                    test replications elapsed relative user.self sys.self user.child sys.child
## 1     dlaplace(x, 0, 1)          100    2.78    2.376      2.75     0.04         NA        NA
## 2    dlaplaceR(x, 0, 1)          100    1.17    1.000      1.14     0.03         NA        NA
## 3 dlaplaceRlog(x, 0, 1)          100    1.25    1.068      1.20     0.04         NA        NA

x <- -10:10

benchmark(
  dlaplace(x, 0, 1),
  dlaplaceR(x, 0, 1),
  dlaplaceRlog(x, 0, 1), 
  replications = 5000
)
##                    test replications elapsed relative user.self sys.self user.child sys.child
## 1     dlaplace(x, 0, 1)         5000    0.05        5      0.05        0         NA        NA
## 2    dlaplaceR(x, 0, 1)         5000    0.01        1      0.02        0         NA        NA
## 3 dlaplaceRlog(x, 0, 1)         5000    0.02        2      0.02        0         NA        NA

Some speedup could be achieved by parallelizing the functions using OpenMP, i.e. adding

#ifdef _OPENMP
#include <omp.h>
#endif

and

#ifdef _OPENMP
#pragma omp parallel for reduction(|| : throw_warning)
#endif

before each of the loops to parallelize (see this). This would influence only the outer for loops.

The problem is that while with large problems it can lead to x2 improvement in speed, with small examples it can lead even to x13 decreased speed as compared to pure-R benchmarks, so the overhead can be significant.

twolodzko / extraDistr

Rcpp implementations may not always be faster then pure-R #11