inlabru-org / inlabru

inlabru
https://inlabru-org.github.io/inlabru/
76 stars 21 forks source link

Pass nsub argument for stable integration method #71

Closed martinju closed 4 years ago

martinju commented 4 years ago

This PR simply allows the user to specify the nsub argument of the "stable" integration method which is currently the default in inlabru. This method adds (nsub+1)^2 temporary proto-integration points to every mesh triangle and maps them to the triangle corners based on the basis functions to determine the integration weight for the mesh nodes which corresponds to the integration points.

The hard coded default is nsub=9 giving 100 such points. Through the following paper (https://doi.org/10.1002/sta4.285) I have experimented with different types of integration schemes (see also the https://github.com/martinju/LGCP-normConst-simulations). The one implemented in inlabru does well, but my tests indicate that for stability it may be preferable to use more than 100 such points. More such points will increase the integration time, so it may not always be desired.

Anyhow, with this PR I would like to pass this option to the inlabru user. Since the nsub argument is already there, hidden in the code, this PR is just about passing that argument through from the user functions. There are many ways of doing this, but I simply added this as part of the bru.options call, and passed the options forward to the functions establishing the integration weights. I have also modified the documentation accordingly. Let me know if you would like any of this to be handled differently.

I've added an illustrating script at inst/MJ_test_to_be_deleted.R which should be deleted before a potential merge.

Note that I was not able to compare lgcp()-runs with different nsub as I could not find a way to pass a seed to get reproducable results with lgcp(). Is that possible?

finnlindgren commented 4 years ago

Thanks! (Note: the regular Travis check has been broken for a while, so we can ignore that for this pull request.) In the coming weeks I'll be working on integrating the feature/mapper branch into devel, to prepare for a new major release; it's a substantial reworking of the internals to allow more general model support. Fortunately, at first glance it looks like your modifications are mostly in unrelated parts of the code, so it shouldn't interfere too much with my upcoming merge. I'll go through it in more detail, but first impression is "great!".

Can you expand on the lgcp() issue? I don't think it does anything random that would need a seed; it's predict.bru() that needs a seed, and it already has that argument.

Also, can you confirm that the intended use case for this is to produce integration schemes for domains that have boundaries that don't align with triangle edges? (When the domain aligns with triangle edges , the sub sampling shouldn't be needed when the weights are projected to the vertices anyway.)

martinju commented 4 years ago

Great to hear! If anything is conflicting I am sure it would be easy to work around as well -- as mentioned this is just about passing integration arguments through.

On lgcp() reproducability, check this out: I am comparing the summary of the fixed parameters, which I don't know how is computed (by INLA not inlabru?), but it does not seem to obey R's seed..?

# Defining test variables
region.coords = matrix(c(0, 0,
                         0, 5,
                         5, 5,
                         5, 0,
                         0, 0), ncol = 2,byrow=T)
region.points = sp::SpatialPoints(coords=region.coords)
region.polygon = sp::SpatialPolygons(list(sp::Polygons(list(sp::Polygon(coords=region.coords)),'0')))
mesh = INLA::inla.mesh.2d(loc.domain = region.points,max.edge = 1,offset = 1)

# Sample point pattern
set.seed(123)
spatstat::spatstat.options(npixel=100)
win <- spatstat::owin(range(region.coords[,1]),range(region.coords[,2]))
lg.s = spatstat::rpoispp(lambda=1,win=win)
obs = sp::SpatialPoints(cbind(x=lg.s$x,y=lg.s$y))

# Just use default matern prior
matern = INLA::inla.spde2.matern(mesh)

# Define model to fit
cmp <- coordinates ~ mySmooth(map = coordinates, model = matern) + Intercept

# Checking reproducability
set.seed(123)
mod_default = lgcp(cmp,data = obs,samplers = region.polygon)
set.seed(123)
mod_default_2 = lgcp(cmp,data = obs,samplers = region.polygon)
all.equal(mod_default$summary.fixed,mod_default_2$summary.fixed)
#[1] "Component ?mean?: Mean relative difference: 4.414581e-06"      
#[2] "Component ?sd?: Mean relative difference: 0.0001167155"        
#[3] "Component ?0.025quant?: Mean relative difference: 7.376279e-05"
#[4] "Component ?0.5quant?: Mean relative difference: 1.177352e-05"  
#[5] "Component ?0.975quant?: Mean relative difference: 0.000109876" 
#[6] "Component ?mode?: Mean relative difference: 1.729328e-05"      
#[7] "Component ?kld?: Mean relative difference: 0.0002442855"       

On intended usage: Yes, the intended usage of this is when the mesh don't align with the integration domain. I don't quite remember what inlabru does when they match -- still using the sub sampling? If I remember correctly I think you can get slightly different weights with different nsub numbers in that case too -- but far from as big as in the unaligned case. I think the best option for the algined cases is to assign area_triangle/3 to every corner point of the triangles which I believe is what you essentially try to approximate and should get in the limit as nsub increases.

finnlindgren commented 4 years ago

For the lgcp() issue, I suspect that's just an effect of multithreading, which can produce different INLA results. To check for consistency, need to compare the differences relative to the computed posterior standard deviations, to see if they are within expected numerical accuracy. To check if this is the case, set options=list(num.threads=1, blas.num.threads=1) (handling of inla options has been improved greatly recently, but possibly only in the new feature branch, so I'm not 100% sure what options are available on the devel branch, but I think these will be passed correctly; see ?bru.options).

For flat domains, any properly placed symmetric integration scheme should result in projected weights area_triangle/3 for triangles that are entirely inside a domain, since this operation is done entirely in the linear space, and we only (at the moment) deal with linear basis functions. But when CRS information is involved it might get more complicated, since the integration weights are then calculated in an equal-area reprojection and not on the original triangles.

martinju commented 4 years ago

For the lgcp() issue, I suspect that's just an effect of multithreading, which can produce different INLA results. To check for consistency, need to compare the differences relative to the computed posterior standard deviations, to see if they are within expected numerical accuracy. To check if this is the case, set options=list(num.threads=1, blas.num.threads=1) (handling of inla options has been improved greatly recently, but possibly only in the new feature branch, so I'm not 100% sure what options are available on the devel branch, but I think these will be passed correctly; see ?bru.options).

Thanks for the input on lgcp() issue. As you suspected it was a multithreading problem, so enforcing single core as you described gave identical results on repeated runs (the required options were available on the devel branch). For this PR this was relevant only for a sanity check in my test script to see that using different nsub numbers gave slightly different results. Pushed a new version of that test script for completeness, although that test script ought to be removed on merge anyway.

finnlindgren commented 4 years ago

I needed to resolve some merge conflicts due to recent devel branch updates for PROJ6 support, but nothing major, and it seems to work as intended. Thanks!