katzfuss-group / GPvecchia

Fast Gaussian-process inference using Vecchia approximations
19 stars 6 forks source link

Exact max-min ordering code #35

Closed dzilber closed 5 years ago

dzilber commented 5 years ago

As described by @katzfuss:

For the Vecchia prediction paper (and the VL paper), it would be great if you could write a wrapper to call the C code from R in vecchia_specify() and replace the current approximate maxmin algorithm. It would also be great if the exact algorithm could be extended to ensure the obs-pred ordering restriction for predictions, if desired. But if it is too complicated, we could just do the exact maxmin for the observed locations, and then do the current approximate algorithm for the prediction locations.

dzilber commented 5 years ago

When running the code locally on my mac through R, Rstudio aborts or hangs due to memory issues. I suspect it is a coding bug or memory leak rather than an issue with complexity because I tried running it with a set of 100 2D locations and the rsession memory usage went from ~100mb to >15GB. I also tried compiling and running the file "main.c" but I am getting an error referring to a "flat namespace" that I don't understand (yet).

dyn.load("src/sortSparse.so")
.C("create_ordering_2d", P=as.array(rep(0, nrow(locs))), 
   revP=as.array(rep(0, nrow(locs))), 
   distances=as.array(rep(0, nrow(locs))),
   N =as.integer(nrow(locs)),
   coords  = as.array(locs)
   )

@katzfuss : It may be necessary to contact the authors or implement the algorithm ourselves. What do you suggest?

katzfuss commented 5 years ago

Does it work if you run the C code outside of R?

dzilber commented 5 years ago

I can run the main.c file in terminal and it seems to work, although there were a bunch of warnings during compilation. The output is below, which is running for 2 million random locations in 2D:

time spent = 65.003279 seconds 
1000 * time spent per Nlog^2(N) = 0.000154 seconds

This implies that my R implementation is incorrect, so I'll study that some more.

katzfuss commented 5 years ago

@jingjiezh has found and corrected some bugs in the C code. Please work together to wrap this function in R. This should be highest priority at the moment.

dzilber commented 5 years ago

The first update is up. The obs-pred ordering will now be attempted.

dzilber commented 5 years ago

Obs-pred ordering heuristic has been implemented by copying the code of the previous version. Results seem to be nearly identical to the previous code for 1 and 2D tests with GPVecchia. For a sample size of 40^2, the time to run vecchia_specify decreased by about 40%, from roughly .3 seconds to about .18 seconds.

An exact obs-pred ordering will likely require modifying the c code and does not seem to be a priority, so this ticket will be closed and a new ticket can be opened if exact obs-pred is needed for the future.