iiasa / ibis.iSDM

Modelling framework for creating Integrated SDMS
https://iiasa.github.io/ibis.iSDM/
Creative Commons Attribution 4.0 International
20 stars 2 forks source link

Factor variables are not correctly handled anymore #125

Closed Martin-Jung closed 3 months ago

Martin-Jung commented 4 months ago

The reason why Pkgdown rendering failed lately is due to a bug probably introduced with issue #106 or earlier, and in particular the handling of factor variables as predictors. I think it has something to do with the code added here where variables subset?

This reprex here fails regardless of engine (glm, xgboost, etc) now, resulting in several vignettes to fail to render...

library(terra)
# Background layer
background <- terra::rast(system.file("extdata/europegrid_50km.tif",package = "ibis.iSDM", mustWork = TRUE))
# Load virtual species points
virtual_species <- sf::st_read(system.file("extdata/input_data.gpkg",package = "ibis.iSDM", mustWork = TRUE), "points") 
# Predictors
predictors <- terra::rast(list.files(system.file("extdata/predictors/", package = "ibis.iSDM", mustWork = TRUE), "*.tif",full.names = TRUE))
predictors <- subset(predictors, c("bio01_mean_50km","bio03_mean_50km","bio19_mean_50km",
                                   "CLC3_112_mean_50km","CLC3_132_mean_50km",
                                   "CLC3_211_mean_50km","CLC3_312_mean_50km",
                                   "elevation_mean_50km",
                                   "koeppen_50km"))
# One of them (Köppen) is a factor, we will now convert this to a true factor variable
predictors$koeppen_50km <- terra::as.factor(predictors$koeppen_50km)

# Create a distribution modelling pipeline
x <- distribution(background) |> 
  add_biodiversity_poipo(virtual_species, field_occurrence = 'Observed', name = 'Virtual points') |>
  add_predictors(predictors, transform = 'scale', derivates = "none") |>
  engine_glm()

# Now train 
mod_null <- train(x, runname = 'Normal PPM projection', only_linear = TRUE, verbose = FALSE)

It does work if factors are fully exploded, e.g. split up per factor level (add_predictors(..,explode_factors = TRUE). I will set this flag now by default in the vignette and hope that they render...

mhesselbarth commented 4 months ago

Interesting that all test etc. still run. But I will try to figure out what broke exactly to address this. In general, I only removed predictors that are provided in the predictors object but actually never used in the formula.

Martin-Jung commented 4 months ago

Yeah, the tests don't check for factor variables, only the vignettes do use them. In the vignette I added the explode_factors flag which makes them render. Merged dev with main branch recently. Not quite sure why this is not working anymore (some engines always require split factors, but others won't)

mhesselbarth commented 4 months ago

This is really hard to debug since I cannot always re-produce the error. Sometime it runs locally (Code chunk Train models with spatial constrains), and sometimes it does not?!

Update 1: One warning gets thrown in clamp_predictions because the pmax / pmin on a factor Update 2: Issue might be in clamp_predictions because many columns get char if factors are present due to the loop starting in L486 Update 3: Few more issues related to exploding factors but not removing the original one

mhesselbarth commented 4 months ago

I am working on this on a new branch (https://github.com/iiasa/ibis.iSDM/tree/factors) because its all a bit more puzzling than expected

mhesselbarth commented 4 months ago

https://github.com/iiasa/ibis.iSDM/commit/e42b25b09ebfae3dda85ce360dd704a402ec2ac5 https://github.com/iiasa/ibis.iSDM/commit/94da65b07be5e7118caa336755f04b81d54774f7 079a8ba

mhesselbarth commented 4 months ago

There were quite a few issues with factors, mainly:

  1. Not more than one factor variables was possible for some engines.
  2. Some penalty factors created did not match the present factor levels.
  3. If exploded, original factor variables was not removed from predictor data.frame.
  4. Improved documentation in add_predictors.

Should be all fixed (#128)

mhesselbarth commented 3 months ago

Will close for now. Please re-open if needed