venelin / POUMM

An R-package implementing the univariate Phylogenetic Ornstein-Uhlenbeck Mixed Model
https://venelin.github.io/POUMM
4 stars 0 forks source link

test fails after adapMCMC update #1

Closed scheidan closed 6 years ago

scheidan commented 6 years ago

Hi Venelin,

I made some smaller changes in adaptMCMC (https://github.com/scheidan/adaptMCMC/commit/e27f3647ccdb64e38ae8085b04506696e3b0ae6a). However, somehow one of the changes breaks the tests of POUMM, see error message below.

The only change I can imagine causing troubles is this: https://github.com/scheidan/adaptMCMC/commit/586a68396dbdc5d73f7a0cf1480c3a5b30f9261c

Could it be that POUMM is still relying on that the density function may return a vector? Having the list interface in place, I would really like to throw an error in this case.

Check: tests
New result: ERROR
     Running ‘test-all.R’
   Running the tests in ‘tests/test-all.R’ failed.
   Complete output:
     > library(testthat)
     > test_check("POUMM")
     Loading required package: POUMM
     Loading required package: Rcpp
     abs call #1: x=-4.547474e-13
     abs call #2: x=1.482931e-08
     abs call #3: x=-1.421085e-14
     abs call #4: x=1.339681e-07
     abs call #5: x=2.842171e-14
     abs call #6: x=-4.547474e-13
     abs call #7: x=1.432318e-08
     abs call #8: x=-1.421085e-14
     abs call #9: x=1.452099e-08
     abs call #10: x=2.842171e-14
     abs call #11: x=-4.547474e-13
     abs call #12: x=1.432318e-08
     abs call #13: x=-1.421085e-14
     abs call #14: x=1.452099e-08
     abs call #15: x=-4.547474e-13
     abs call #16: x=1.432318e-08
     abs call #17: x=-1.421085e-14
     abs call #18: x=1.452099e-08
     [1] "Performing ML-fit..."
     Call to optim no. 1 : starting from  10.68023, -4.262606,
37.382683, 0.068014
      parLower =  0, -5.178238, 0, 0
      parUpper =  15.257471, 13.134392, 74.765367, 1.360275

     Call 0: loglik on par=(10.68023, -4.262606, 37.382683, 0.068014):
-353.0585

     Call 2: loglik on par=(10.67923, -4.262606, 37.382683, 0.068014):
-353.0582

...

    Call to optim no. 50 : starting from  14.494598, 7.640603, 3.738268,
0.680137
      parLower =  0, -5.178238, 0, 0
      parUpper =  15.257471, 13.134392, 74.765367, 1.360275
     Checking the max-loglik value with Rmpfr, current: val =  -101.6391
===> New:  -101.6391 ===> OK.max loglik from ML:
     [1] -101.6391
     attr(,"g0")
        theta
     4.440308
     attr(,"g0LogPrior")
     [1] NA
     parameters at max loglik from ML:
     [1] 0.04879197 2.38309143 0.00000000 0.66859739
     [1] "Performing MCMC-fit..."
     Chain No: 1 , nSamplesMCMC =  10000 , initial state:
        alpha    theta    sigma   sigmae
     0.000000 4.034491 1.000000 0.000000
     Chain No: 2 , nSamplesMCMC =  10000 , initial state:
          alpha      theta      sigma     sigmae
     0.04879197 2.38309143 0.00000000 0.66859739
     Chain No: 3 , nSamplesMCMC =  10000 , initial state:
        alpha    theta    sigma   sigmae
     0.000000 4.034491 1.000000 1.000000
     ── 1. Error: (unknown) (@test-POUMM.R#142)
────────────────────────────────────
     $ operator is invalid for atomic vectors
     1: POUMM(z = z[1:N], tree = tree, spec = specifyPOUMM(z[1:N], tree,
nSamplesMCMC = 10000),
            verbose = TRUE) at testthat/test-POUMM.R:142
     2: do.call(mcmcPOUMMGivenPriorTreeVTips, c(list(loglik = loglik,
fitML = fitML, verbose = verbose,
            debug = debug, pruneInfo = pruneInfo), spec))
     3: (function (loglik, fitML = NULL, parMapping, parInitMCMC,
parPriorMCMC, parScaleMCMC,
            nSamplesMCMC, nAdaptMCMC, thinMCMC, accRateMCMC, gammaMCMC,
nChainsMCMC, samplePriorMCMC,
            pruneInfo, ..., verbose = FALSE, parallelMCMC = FALSE)
        {
            samplePriorMCMC <- c(samplePriorMCMC, rep(FALSE, nChainsMCMC
....

     4: sapply(chains, function(.) .$valueMaxLoglik)
     5: lapply(X = X, FUN = FUN, ...)
     6: FUN(X[[i]], ...)

     ══ testthat results
═══════════════════════════════════════════════════════════
     OK: 33 SKIPPED: 0 FAILED: 1
     1. Error: (unknown) (@test-POUMM.R#142)

     Error: testthat unit tests failed
     Execution halted
venelin commented 6 years ago

Hi Andreas,

I've installed the newest adaptMCMC version from github and reproduced the error with the current version of POUMM, (version 2.0, which is now also on github). The error message was not very informative, because it was caused by the function MCMC returning a try-error instead of a list. The reason for the try-error returned by MCMC was the fact that in my call to MCMC I was using gamma = 0.5 (which used to be the default value in previous versions). In the current version, the default value for gamma has been changed to 2/3 and the allowed range for gamma is (0.5,1]. I fixed the problem by changing the gamma-value in POUMM to 2/3 as well. I am planning to release POUMM 2.0 on CRAN in the coming week.

P.S. Regarding, using the list-functionality : I finally preferred the simpler interface of adaptMCMC (where the posterior function returns a value and not a list), because of speed concerns - I really need fast processing of the MCMC proposals.

P.S.2. One other issue I wanted to mention earlier is about the use of the r-function is.symmetric in MCMC. I noticed that this function is quite slow on small matrices -> the trivial test isTRUE(all(t(X)==X)) is about 10 times faster on 5x5 matrices. In my specific case (Bayesian POUMM inference) this latency of the is.symmetric function is doubling the time for the MCMC call.

scheidan commented 6 years ago
  1. I changed gamma because someone pointed out in the original paper the condition is gamma > 0.5. Shouldn't have a big influence though.

  2. In the next version I really want to enforce that the density function must return a scalar or a list (as announced in the warning message). I run a simple benchmark (see below) and the list interface costs almost no time. If this tiny overhead is too much for your application I propose i) to use the ramcmc package to build a striped down sampler for your application (ideally in C), or ii) copy my old code in your package.

  3. Thats great! I didn't know that is.symmetric is that slow. I will change that, which should more than compensate for the list overhead.

Can you live with this changes? I plan to upload the new version end of the week to CRAN.

Benchmark list interface:

## -------------------------------------------------------
## multi variate normal

d <- 5                                  # number of dimensions
means <- 2^(1:d -1 )

## covariance matrix
S <- matrix(0, ncol=d, nrow=d)
diag(S) <- 1:d
S[lower.tri(S)] <- 1:sum(lower.tri(S))
Sigma <- S%*%t(S)

library(mvtnorm)

p.log <- function(x) {
  dmvnorm(x, means, Sigma, log=TRUE)
}

p.log.list <- function(x) {
  if(x[1]<0) {
    return (list(log.density=dmvnorm(x, means, Sigma, log=TRUE), x=x))
  } else {
    return (list(log.density=dmvnorm(x, means, Sigma, log=TRUE), x=x, extra="Wow, x was positive!"))
  } 
}

## ----------------------
## sampling one chain

n <- 15000
burn.in <- n/2

system.time({
    samp <- MCMC(p.log, n, init=rep(0,d), acc.rate=0.234, adapt=TRUE, showProgressBar=T)
})
##  user  system elapsed 
##  8.972   0.324   8.966 

system.time({
    samp <- MCMC(p.log.list, n, init=rep(0,d), acc.rate=0.234, adapt=TRUE, showProgressBar=T)
})
##  user  system elapsed 
##  8.816   0.000   8.821

Benchmark is.symmetric

n <- 20
X = matrix(1:(n*n), ncol=n)
Y = X %*% t(X)

system.time(replicate(15000, isSymmetric(X)))
## user  system elapsed 
##  0.908   0.000   0.909 
system.time(replicate(15000, isSymmetric(Y)))
##  user  system elapsed 
## 2.152   0.004   2.158 

system.time(replicate(15000, all(t(X)==X)))
##   user  system elapsed 
##  0.096   0.000   0.098 
system.time(replicate(15000, all(t(Y)==Y)))
##   user  system elapsed 
##  0.180   0.000   0.179 
venelin commented 6 years ago

Yes, all three of your above points should present no further issues with the POUMM package.

scheidan commented 6 years ago

Great!

scheidan commented 6 years ago

Just for information: After some testing I realized that replacing isSymmetric is not a good idea - it even slows the sampler down. The reason are small floating point errors; all(t(Y)==Y)) is overly sensitive and hence nearPD is called to often (which is a slow function).

A "correct" version with floating point tolerance, isTRUE(all.equal(t(Y), Y))), is as slow as isSymmetric.