katzfuss-group / GPvecchia

Fast Gaussian-process inference using Vecchia approximations
19 stars 6 forks source link

allow user to pass an arbitrary R function as covariance function in the MRA case #43

Closed marcinjurek closed 5 years ago

marcinjurek commented 5 years ago

So far cov.model in vecchia_specify() can be either a string "matern" or "exponential" or a matrix. It would be more flexible if the user could pass any R function as covariance function. But we also need to check how fast that would be.

this is related to #27 but handles only the MRA case

marcinjurek commented 5 years ago

first, need to come up with a file to test it all on. The usual test file does not work on ticket27 branch. So first I need to merge ticket27 into master and push to the remote.

marcinjurek commented 5 years ago

Alright, merge completed but the tree construction algorithm didn't merge correctly and there are some errors thrown. Need to investigate that

marcinjurek commented 5 years ago

ok! First attempt is there. Need to test it now!

marcinjurek commented 5 years ago

@katzfuss so here are the run times when we allow the user to pass the function

I used n=10k locations and m=40. In one scenario I supplied a function that read pairs of locations (from NNarray), calculated distances between all pairs and then returned the exp(-distance) as covariance

Alternatively, I wrote the same function in c++ and the user passed only the function name (like in Wenlong's implementation).

The first one took 0.44s while the second one only 0.04s. I think it would be good to allow the user to pass an arbitrary covariance function but using one of the predefined families should be preferred as it seems to be 10 times faster.

katzfuss commented 5 years ago

Consider the following code:

n=10000; m=40
dists=matrix(rnorm(n*m),nrow=n)
system.time(exp(-dists/2))

Line 2 only needs to be run once, so its time is not too important. Line 3 on my computer produces the following output:

user system elapsed 0.03 0.00 0.03

marcinjurek commented 5 years ago

Hm... so this is assuming that covariance function is isotropic. I followed the specification outlined in #27. Moreover, Wenlong's s++ code also calculates distances so in order to make a fair comparison I thought I should have included that too. In the spatial only case they have to be calculated anyway.

In any case, looks like you closed #27 so should I not work on the more general case? I am right now and I thought I could finish it today.

katzfuss commented 5 years ago

If you can finish the general case quickly, please do. But I suspect that the matrix-based approach will not work as well there, because we have to provide n x m^2 covariances. What we probably should do or at least keep on the to-do list is to ensure that the Bessel evaluations are avoided if smoothness=1.5, see e.g. here https://en.wikipedia.org/wiki/Mat%C3%A9rn_covariance_function#Simplification_for_%CE%BD_half_integer

In the MRA case, we should certainly allow covariances to be specified as pre-implemented C++ functions, matrices, or R functions (via matrices).

joeguinness commented 5 years ago

I've wondered about this very issue. The package will be more valuable if users can specify covariance functions, but R functions will be slower than c++. Would it be possible for them to write c++ code and pass the code in? I have no idea how this would work, but my impression is that the inline package does something like this.

On Thu, Apr 11, 2019 at 11:02 AM Marcin Jurek notifications@github.com wrote:

@katzfuss https://github.com/katzfuss so here are the run times when we allow the user to pass the function

I used n=10k locations and m=40. In one scenario I supplied a function that read pairs of locations (from NNarray), calculated distances between all pairs and then returned the exp(-distance) as covariance

Alternatively, I wrote the same function in c++ and the user passed only the function name (like in Wenlong's implementation).

The first one took 0.44s while the second one only 0.04s. I think it would be good to allow the user to pass an arbitrary covariance function but using one of the predefined families should be preferred as it seems to be 10 times faster.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/katzfuss-group/GPvecchia/issues/43#issuecomment-482151683, or mute the thread https://github.com/notifications/unsubscribe-auth/AGxfm7Ri6mBm2hmI-xUpdaDjQqMFNAd3ks5vf051gaJpZM4cWCrK .

marcinjurek commented 5 years ago

@joeguinness I created #45 for that particular feature @katzfuss for smoothness 0.5, 1.5, 2.5 matern kernel is now evaluated using the polynomial formula