Extracting pseudo-absences

jeffreyhanson commented 7 months ago

I'm working on fitting some species distribution models and I would like to be able to access the pseudo-absence data that is automatically generated when using add_biodiversity_poipo(). For example, I would be interested in visualizing the spatial distribution of the pseudo-absences and also using them for model evaluation. Is it possible to extract these from a BiodiversityDistribution (output from distribution()) or DistributionModel (output from train()) object?

Alternatively, if this isn't possible, is it possible to manually specify the pseudo-absence points for add_biodiversity_poipo()? I see that the documentation talks about add_pseudoabsence() which can be used to add pseudo-absence points to a presence-only points dataset. However, I'm not sure if such a combined point dataset can be used with add_biodiversity_poipo()? Although one of the vignettes shows how a dataset with presences and pseudo-absences can be used with add_biodiversity_poipa(), my understanding is that such an approach would mean that the modelling process treats the pseudo-absences as "true absences" -- which is not what I intend?

Let me know if you'd like a reprex?

jeffreyhanson commented 7 months ago

For example, here's a reprex, where I try to manually specify pseudo-absences for a PPM model. It just occurred to me - I'm assuming PPM models actually use pseudo-absences, maybe I've got this wong and they don't use pseudo-absences so my question is invalid (e.g., something like asking "what is the best way to remove the scales from a bear? what do you mean, bears don't scales...")

# load packages
library(ibis.iSDM)

# load data
bg_data <- 
  system.file("extdata/europegrid_50km.tif", package = "ibis.iSDM") |>
  terra::rast()
spp_data <- 
  system.file("extdata/input_data.gpkg", package = "ibis.iSDM") |>
  sf::read_sf()
env_data <- 
  system.file("extdata/predictors/", package = "ibis.iSDM") |>
  list.files("*.tif", full.names = TRUE) |>
  terra::rast()

# add pseudo-absences
psa_sett <- pseudoabs_settings(background = bg_data, nrpoints = 200, method =  "random")
spp_data2 <- add_pseudoabsence(df = spp_data, field_occurrence = "Observed", settings = psa_sett)

# define model specification
model <- 
  distribution(bg_data)  |>  
  add_predictors(env = env_data, transform = "scale", derivates = "none")  |> 
  add_biodiversity_poipo(spp_data2, field_occurrence = "Observed") |>  
  engine_inlabru()

#> [Setup] 2024-02-02 11:55:49.360766 | Provide a background with a valid projection!
#> [Setup] 2024-02-02 11:55:49.376749 | Creating distribution object...
#> [Setup] 2024-02-02 11:55:49.41185 | Adding predictors...
#> [Setup] 2024-02-02 11:55:49.413278 | Transforming predictors...
#> [Setup] 2024-02-02 11:55:49.484272 | Adding poipo dataset...
#> [Setup] 2024-02-02 11:55:49.651038 | Absence points found. Potentially this data needs to be added as presence-absence instead?

Martin-Jung commented 6 months ago

Heya, a few things:

1) In your example you are taking a presence-only dataset and manually add pseudo-absence points to it. This changes the dataset to a presence-absence dataset and during model building the package correctly complains that there are Absence points found. If you want to manually add absence-points prior to fitting, then add the biodiversity dataset via add_biodiversity_poipa() instead of with add_biodiversity_poipo() 2) There are some basic plotting functionalities in any BiodiversityDataset which you can simply access via x$plot(). For example if points are added as presence-absence in your above example this looks like this

model <- 
  distribution(bg_data)  |>  
  add_predictors(env = env_data, transform = "scale", derivates = "none")  |> 
  add_biodiversity_poipa(spp_data2, field_occurrence = "Observed") |>  
  engine_inlabru()

model$biodiversity$plot()

grafik

3) If you need to control any pseudo-absence generation in add_biodiversity_poipo() you could pass a specific Settings object (created with pseudoabs_settings() there to the parameter pseudoabsence_settings). This changes the default behaviour for sampling any pseudo-absence data throughout. INLA for example treats every single node on a mesh that as background by default for any lgcp inferences...

4) If you want to access the biodiversity data in your model object, this can be found in model$biodiversity. The respective functions for this (sorry for missing documentation still) would be to first query the id of the dataset and then return the data as sf object. Example: model$biodiversity$get_data( model$biodiversity$get_ids()[[1]] ) Similar ways exist to query the point data from fitted DistributionModel objects by looking within the fit$model$biodiversity object which contains all data used for inference.

Hope that helps.

jeffreyhanson commented 6 months ago

Thanks for explaining all that - that's really helpful!

Just to clarify, if I'm using presence-only data (via add_biodiversity_poipo()) with the inlabru engine (via engine_inlabru()), then the INLA mesh is used for the pseudo-absence points and the pseudoabsence_settings parameter of add_biodiversity_poipo() is ignored?

Martin-Jung commented 6 months ago

Thanks for explaining all that - that's really helpful!

Just to clarify, if I'm using presence-only data (via add_biodiversity_poipo()) with the inlabru engine (via engine_inlabru()), then the INLA mesh is used for the pseudo-absence points and the pseudoabsence_settings parameter of add_biodiversity_poipo() is ignored?

For INLA So far yes (code starting here), although I think this can actually be passed on as well somehow via method stack. TBD when I have time to think about the other INLA issue. Will report back. For other engines this is already the default behaviour (for a Bayesian engine you could try it out with engine_breg() and a single data type).

jeffreyhanson commented 6 months ago

Brilliant - thanks! Yeah, I'm mainly interested in using INLA for the integrated modelling, so understanding how it uses the mesh and how that relates to psuedo-absences was my main question/uncertainty here. Sorry, I should have been more explicit about that in the original post.

Martin-Jung commented 6 months ago

Aye, understand. INLA has so far been the hardest to maintain thus the many changes and relatively messy code still :D

iiasa / ibis.iSDM

Extracting pseudo-absences #94