Prediction raster from `train()` does not spatially conform with `distribution()` background

jeffreyhanson commented 8 months ago

Hi,

I'm not sure if this is a bug or intended behavior, but I noticed that the output prediction raster generated using using train() does not have the same resolution, dimensionality, and extent as the background argument to distribution(). I would have expected these two rasters to have exactly the same spatial properties as each other? If this is intended behavior, is there a way to force the output prediction raster to have the same spatial properties as the background object? Although I could resample the output prediction raster to have the same spatial properties as the background data afterwards, this is not desireable and could potentially introduce artificial spatial auto-correlation into the model predictions (e.g., if there is a large difference between the resolution of the background raster and the model predictions)? Sorry, if I'm missing something obvious? I've added a reprex below.

# load packages
library(ibis.iSDM)

# load data
bg_data <- 
  system.file("extdata/europegrid_50km.tif", package = "ibis.iSDM") |>
  terra::rast()
spp_data <- 
  system.file("extdata/input_data.gpkg", package = "ibis.iSDM") |>
  sf::read_sf()
env_data <- 
  system.file("extdata/predictors/", package = "ibis.iSDM") |>
  list.files("*.tif", full.names = TRUE) |>
  terra::rast()

# define model specification
model <- 
  distribution(bg_data)  |>  
  add_predictors(env = env_data, transform = "scale", derivates = "none")  |> 
  add_biodiversity_poipo(spp_data, field_occurrence = "Observed") |>  
  engine_inlabru()

# fit model
fit <- train(model)

# extract prediction
pred <- fit$get_data()

# check 

# check that background data and environmental data are spatially conformant
terra::compareGeom(bg_data, env_data)
#> [1] TRUE

# check that background data and model predicitons are spatially conformant
terra::compareGeom(bg_data, pred)
#> Error: [compareGeom] extents do not match

# print extents
print(terra::ext(bg_data))
#> SpatExtent : -16.063769095, 34.949597322, 36.321862064, 71.534918446 (xmin, xmax, ymin, ymax)
print(terra::ext(pred))
#> SpatExtent : -15.8061258302677, 35.2072405867323, 36.0629425317794, 71.2759989137794 (xmin, xmax, ymin, ymax)

# print resolutions
print(terra::res(bg_data))
#> [1] 0.4952754 0.4959585
print(terra::res(pred))
#> [1] 0.5152865 0.5178391

Session info

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /opt/R/R-4.3.1/lib/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Pacific/Auckland
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] ibis.iSDM_0.1.1

loaded via a namespace (and not attached):
 [1] Matrix_1.6-5        crayon_1.5.2        dplyr_1.1.4        
 [4] compiler_4.3.1      renv_1.0.3          tidyselect_1.2.0   
 [7] Rcpp_1.0.12         MatrixModels_0.5-3  parallel_4.3.1     
[10] assertthat_0.2.1    splines_4.3.1       uuid_1.2-0         
[13] yaml_2.3.8          lattice_0.21-8      plyr_1.8.9         
[16] R6_2.5.1            generics_0.1.3      classInt_0.4-10    
[19] inlabru_2.10.1      sf_1.0-15           iterators_1.0.14   
[22] tibble_3.2.1        fmesher_0.1.5       units_0.8-5        
[25] proto_1.0.0         DBI_1.2.1           sn_2.1.1           
[28] pillar_1.9.0        rlang_1.1.3         utf8_1.2.4         
[31] sp_2.1-2            terra_1.7-65        cli_3.6.2          
[34] withr_3.0.0         magrittr_2.0.3      class_7.3-22       
[37] foreach_1.5.2       grid_4.3.1          INLA_23.09.09      
[40] lifecycle_1.0.4     vctrs_0.6.5         mnormt_2.1.1       
[43] KernSmooth_2.23-21  proxy_0.4-27        glue_1.7.0         
[46] numDeriv_2016.8-1.1 codetools_0.2-19    zoo_1.8-12         
[49] stats4_4.3.1        fansi_1.0.6         e1071_1.7-14       
[52] tools_4.3.1         pkgconfig_2.0.3

mhesselbarth commented 8 months ago

This is related to the mesh that INLA/INLABRU uses to predict into space. For all (most?) other engines the spatial dimensions are exactly the same.

Not sure it's possible to force INLA/INLABRU to use a mesh with exactly the same characteristics...

jeffreyhanson commented 8 months ago

Ah, I see - thanks for explaining that! To ensure that I can get model predictions at a particular resolution and spatial extent, would you recommend resampling the data to a finer resolution prior to model fitting? Or manually specifying the mesh?

jeffreyhanson commented 8 months ago

Just to follow up, I've found that using something like this (see code below) tends to produce rasters with approximately the correct the spatial resolution so that I can subsequently resample the output prediction raster to match the original background raster.

ibis.iSDM::engine_inlabru(x, proj_stepsize = max(terra::res(bg_data)) * 0.85)

Although this increases the overall computatio time quite a bit, does this seem like a reasonable solution?

Martin-Jung commented 7 months ago

INLA does not use the predictor covariates to create the prediction output frame, but makes predictions on the created (if parameters are supplied) or provided (if directly passed on) mesh. A gridded prediction is only created afterwards by essentially turning each mesh node back into a gridded surface (linearly interpolating per unit stepsize). This unfortunately can result to mismatches with the original covariate spatial resolution.

Changing the stepsize can be a way but I would guess this is quite problem/dataset dependent and hard to find a golden rule there... Generally you would want a fine mesh with many nodes particular where your data is distributed which unfortunately is almost always linked to higher computational effort. The mesh can be checked via plot( model$engine$data$mesh ) and I think there was also an in INLA function to build meshes (INLA::INLA::meshbuilder()). Note: The denseness and location of mesh nodes has been shown to affect predictions, so the choice of creating a mesh is ultimately a parameter choice and left to users. See Dambly et al.

My recommendation if INLA is heavily used would be to create a 'default' reasonably fine mesh a priori and then supply this mesh to all predictions via engine_inlabru(optional_mesh = mymesh).

There is also build-in convenience functions for aligning/resampling grids which in cases where the prediction is larger might help sometimes (alignRasters(layer, templater) )).

jeffreyhanson commented 7 months ago

Ah I see - thank you very much for explaining all that @Martin-Jung!

iiasa / ibis.iSDM

Prediction raster from `train()` does not spatially conform with `distribution()` background #96