iiasa / ibis.iSDM

Modelling framework for creating Integrated SDMS
https://iiasa.github.io/ibis.iSDM/
Creative Commons Attribution 4.0 International
20 stars 2 forks source link

Save/load model trained with bart-engine to disk and calculate partial responses #127

Closed mhesselbarth closed 5 months ago

mhesselbarth commented 5 months ago

There seems to be an issue when saving a model trained with the bart engine to disk, re-loading the model and running the partial() function on the object. For the re-loaded model, the response is zero. So far, seems to be a specific issue for the bart engine.

# Load the package
library(ibis.iSDM)
library(terra)
#> terra 1.7.71
library(sf)
#> Linking to GEOS 3.11.2, GDAL 3.8.2, PROJ 9.3.1; sf_use_s2() is FALSE

background <- terra::rast(system.file("extdata/europegrid_50km.tif",
                                      package = "ibis.iSDM", mustWork = TRUE))

virtual_species <- sf::st_read(system.file("extdata/input_data.gpkg",
                                           package = "ibis.iSDM", mustWork = TRUE),
                               "points")
#> Reading layer `points' from data source 
#>   `C:\Users\hesselbarth\Documents\R-packages\ibis.iSDM\extdata\input_data.gpkg' 
#>   using driver `GPKG'
#> Simple feature collection with 208 features and 5 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 4.109162 ymin: 48.7885 xmax: 24.47594 ymax: 64.69323
#> Geodetic CRS:  WGS 84

predictors <- terra::rast(list.files(system.file("extdata/predictors/",
                                                 package = "ibis.iSDM", mustWork = TRUE),
                                     "*.tif",full.names = TRUE))

predictors <- subset(predictors, c("bio01_mean_50km","bio03_mean_50km",
                                   "CLC3_112_mean_50km","CLC3_132_mean_50km"))

mod_test_a <- distribution(background) |>
  add_predictors(env = predictors, transform = "scale", derivates = "none") |>
  # add_biodiversity_poipo(virtual_species, formula = "observed ~ bio01_mean_50km + bio03_mean_50km",
  #                        field_occurrence = "Observed", name = "bio") |>
  add_biodiversity_poipo(virtual_species, formula = "observed ~ CLC3_112_mean_50km + CLC3_132_mean_50km",
                         field_occurrence = "Observed", name = "CLC") |>
  engine_bart() |>
  train(only_linear = F, method_integration = "predictor")

tmp_path <- tempfile()

write_model(mod = mod_test_a, fname = paste0(tmp_path, ".rds"))
#> [Export] 2024-05-14 16:30:16.949316 | Writing model to file...

mod_test_b <- ibis.iSDM::load_model(fname = paste0(tmp_path, ".rds"))
#> [Export] 2024-05-14 16:30:19.496442 | Loading previously serialized model (size: 21.4 Mb)

variable_length <- 50
x.var <- "CLC3_112_mean_50km"

partial(mod_test_a, x.var = x.var, plot = TRUE, variable_length = variable_length)

partial(mod_test_b, x.var = x.var, plot = TRUE, variable_length = variable_length)

Created on 2024-05-14 with reprex v2.1.0

Martin-Jung commented 5 months ago

Interesting and could be an issue with the dbarts package and serialization? Maybe worth exploring if the raw model behaves similarly, so taking fit$get_model() and saving it as rds or so on disk, then load it again and see if the partial calculation works (function for this in the utils_bart script).

If nothing else, the other idea could be to precompute the partials during saving and then save them in this or another object...? Although I know that it can take quite long for individual partials.

mhesselbarth commented 5 months ago

Fixed it. Was related to the "state of the sampler", which R's serialization mechanism must be able to access. The trick is to "touch" it as explained here. Added this to the write_model() function for x$get_name() == "BART-Model".