lbelzile / TruncatedNormal

Simulation from truncated Gaussian and Student vectors, high dimensional CDF via exponential tilting
https://lbelzile.github.io/TruncatedNormal/
6 stars 4 forks source link

Cpp port #7

Open lbelzile opened 2 years ago

lbelzile commented 2 years ago

Reduce memory usage, increase speed through lower level language, handle higher dimensional models with sparse matrices.

The pedmod package by Benjamin Christoffersen has many of these features developed. Would need extensions for the Student model.

Extract from an email sent by Christoffersen on March 3rd, 2022

I have made C++ implementation of your tilting method that:

a. Has a low memory usage. b. Exploits single instruction, multiple data (SIMD). This can though be a bit more extensively exploited. c. Uses both scrambled Sobol sequences like your implementation and the randomized Korobov rules used by Genz. d. Has essentially no allocations during the importance sampling and it is cache friendly. e. Like Genz's implementation, the method can stop early if user specified thresholds are satisfied.

The implementation can be found in my pedmod package in the version on GitHub at https://github.com/boennecd/pedmod. Thus, the package needs to be installed with, e.g., remotes::install_github. The pedmod::mvndst function is comparable to pmvnorm.

There are examples in the README file on GitHub where I compare the implementation with your implementation in the TruncatedNormal package. The new version is faster. You can find the examples under the section The Multivariate Normal CDF Approximation. This link here may take you there: https://github.com/boennecd/pedmod#the-multivariate-normal-cdf-approximation

lbelzile commented 2 years ago

A list of comments is given below regarding what could be extracted and how the code can be a simplified if it was a standalone implementation of the likelihood approximation:

  • mvt.f file contains the code by Genz to permute the variables and simultaneously compute the Cholesky decomposition. It could be nice to re-write this in C++ (you already have two implementations for this).
  • new-mvt.h contains the high level code to use the randomized Korobov rules and the scrambled Sobol sequences.
  • sobol.[h]/[cpp] contains code to construct the scrambled Sobol sequences incrementally. It is faster than the original code.
  • find-tilting-param.[h]/[cpp] contains code to find the minimax tilting parameters. It currently depends on my psqn package but this dependence can be replaced by another non-linear optimizer. I have not yet handled the case where the solution is not in the interior.
  • cdfaprx.h contains code for the running the importance sampling. This can be greatly simplified if you only need the likelihood.
  • There is a number of other smaller files that are required but these should be easy to copy.
  • There is a dependence on Armadillo, R's C API and Boost. The Armadillo dependence could be removed by using a few BLAS and LAPACK functions.
  • The code can be simplified if the C++ code is never run in parallel from C++.