Vecchia likelihood looks inaccurate in the exact case

marcinjurek commented 5 years ago

This will be a longer ticket, but I wanted to document everything well.

We know that for some special parameter settings MRA is exact. In particular, this requires no measurement error (in the old MRA-only algorithm). Indeed, when I run my Python code with 40 observed locations and m.e. variance 1e-6 I almost identical likelihood values:

marcin@T530:~/pyMRA/pyMRA/tests$ python test-MRA.py
/home/marcin/.local/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
10:59:01 : pyMRA - === simulating data ===
10:59:01 : pyMRA - simulating locations
10:59:01 : pyMRA - calculating covariance matrix
10:59:01 : pyMRA - simulating observations
10:59:01 : pyMRA - === starting MRA ===
10:59:01 : pyMRA.MRATree - r: 2,        J: 3,   M: 3
10:59:01 : pyMRA.MRATree - mode: parallel
10:59:01 : pyMRA.MRATree - start building the tree
10:59:02 : pyMRA.MRATree - avg leaf size: 2.000000
10:59:02 : pyMRA.MRATree - max leaf size: 2.000000
10:59:02 : pyMRA.MRATree - min leaf size: 2.000000
10:59:02 : pyMRA - MRA predictions finished. It took 0.95s
10:59:02 : pyMRA - === Starting ordinary kriging ===
10:59:02 : pyMRA - Kriging finished. It took 0.0046s
[[-19.59390178]] [[-19.59418191]]

However, this does not seem to be the case for the Vecchia code. Even, when m.e. variance is just as small results are different:

> source('~/GPvecchia/tests/test-vecchia-mra.r')
[1] "exact representation available!"
[1] -6.560676
[1] -6.201595

This difference can be bigger for some realizations. For example:

> source('~/GPvecchia/tests/test-vecchia-mra.r')
[1] "exact representation available!"
[1] -6.703282
[1] -10.01786

when we use the following domain partitioning: err

and the reversed NNarray is

$U.prep$revNNarray
     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA    1
[2,]   NA   NA    1    2
[3,]   NA   NA    1    3
[4,]   NA    1    2    4
[5,]    1    2    4    5
[6,]   NA    1    2    6
[7,]   NA    1    3    7

Finally, I checked my formulas for exact likelihood calculation with the ones used in the Python code and they give the same values.

I hope it doesn't come off as me bragging about the Python code. I basically wanted to check everything and possibly eliminate all other sources of error before submitting a bug report. I'm not yet that familiar with the vecchia code to debug it quickly. I feel like I should take it over and support it, but it will take me a while to be able to fix issues like this one.

katzfuss commented 5 years ago

If the numbers in the revNNarray correspond to the locations in the plot that I think they do, then the NNarray should be correct. It's hard to debug without seeing the code in test-vecchia-mra.r.

One thing I can think of is that numerical error is introduced by the extremely small m.e. variance of 1e-6. Can you try setting the variance to a larger value, say 1?

marcinjurek commented 5 years ago

Thank you very much for your quick response!

Yeah, I tried it for different m.e. variances, even as high as 0.5. I've run it just now with 1, as you suggested and the same thing happens. The difference between the likelihood values varies greatly for different RNG seeds but is fairly constant for the same seed and different orders of magnitude of the m.e. variance. Also, here is the code for the test file.

rm(list=ls())
library(GpGp)
library(parallel)
library(Matrix)

setwd("/home/marcin/GPvecchia")
source("R/vecchia_specify.R")
source("R/createU.R")
source("R/vecchia_likelihood.R")
source("R/vecchia_prediction.R")
source("R/RcppExports.R")
source("R/ordering_functions.R")
source("MRA/mraNN.r")
source("R/whichCondOnLatent.R")
source("R/U_sparsity.R")
source("R/NN_kdtree.R")

Rcpp::sourceCpp('src/U_NZentries.cpp')

spatial.dim=1 # number of spatial dimensions
n=7  # number of observed locs
m=3

# simulate locations
#set.seed(11)
if(spatial.dim==1){
  locs=matrix(runif(n),ncol=1)
} else {
  locs = cbind(runif(n),runif(n))
}

sig2=1; range=.1; smooth=0.5
me.var = 1

covparms =c(sig2,range,smooth)
covfun <- function(locs) sig2*MaternFun(fields::rdist(locs),covparms)

nuggets=rep(me.var,n)

# simulate observations
if(n < 1e4) {
  Sigma = covfun(locs) + diag(nuggets)
  Sigma.c = chol(Sigma)
  z=as.numeric(t(Sigma.c)%*%rnorm(n))
} else z=rnorm(n)

V = vecchia_specify(z, locs, m, conditioning='mra', J=2)

##### likelihood evaluation #####
covparms=c(sig2,range,smooth)
vecchia_loglik = vecchia_likelihood(V,covparms,nuggets)

# exact likelihood
const = dim(locs)[1]*log(2*pi)
logdet = as.numeric(determinant(Sigma, logarithm = TRUE)$modulus)
quad.form2 = as.numeric(t(z) %*% solve(Sigma) %*% z)
neg2loglike = const + logdet + quad.form2
loglik = -neg2loglike/2

print(loglik)# - vecchia_loglik)
print(vecchia_loglik)

It throws a warning, but I inspected it and it seems to me to be unrelated.

marcinjurek commented 5 years ago

Also, if you tamper with the mra tree generating code it might break. There are several hacks I used to get it to work and if you change some settings, it might break. But I was working under the assumption that as long as the NNarray is fine, it should not matter for now.

katzfuss commented 5 years ago

not sure what is going wrong then. i do get the exact likelihood with the previously implemented vecchia orderings/conditioning:

# coord
vecchia.approx=vecchia_specify(locs,m=1)
vecchia_likelihood(z,vecchia.approx,covparms,nuggets)

# maxmin
vecchia.approx=vecchia_specify(locs,m=4,ordering='maxmin')
vecchia_likelihood(z,vecchia.approx,covparms,nuggets)

so it looks like something is not working correctly with the MRA conditioning.

marcinjurek commented 5 years ago

The problem was that the ordering of locations changed within vecchia_specify.R. It is fixed now and the update 8ee8838 is on the mra branch now.

katzfuss-group / GPvecchia

Vecchia likelihood looks inaccurate in the exact case #22