pbs-assess / sdmTMB

:earth_americas: An R package for spatial and spatiotemporal GLMMs with TMB
https://pbs-assess.github.io/sdmTMB/
182 stars 26 forks source link

Rename time column internally to protect against extra_time + as.factor conflict #320

Closed Lewis-Barnett-NOAA closed 4 months ago

Lewis-Barnett-NOAA commented 4 months ago

I've noticed that models with extra_time specified are causing estimation problems in Windows.

For example: mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 20) fit <- sdmTMB( density ~ 0 + as.factor(year), data = pcod, mesh = mesh, time = "year", family = tweedie(link = "log"), spatiotemporal = "ar1", extra_time = c(2006, 2008, 2010, 2012, 2014, 2016) )

produces the error: Error in solve.default(h, g) : Lapack routine dgesv: system is exactly singular: U[4,4] = 0

sessionInfo() R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C LC_TIME=English_United States.utf8

time zone: America/Los_Angeles tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] sdmTMB_0.4.3

loaded via a namespace (and not attached): [1] sandwich_3.0-2 utf8_1.2.4 generics_0.1.3 class_7.3-22 fmesher_0.1.5 KernSmooth_2.23-21 [7] lattice_0.21-8 lme4_1.1-35.1 magrittr_2.0.3 grid_4.3.0 estimability_1.4.1 mvtnorm_1.2-4
[13] Matrix_1.6-5 e1071_1.7-14 DBI_1.2.2 survival_3.5-5 multcomp_1.4-23 mgcv_1.8-42
[19] fansi_1.0.6 TH.data_1.1-2 codetools_0.2-19 cli_3.6.1 rlang_1.1.3 units_0.8-5
[25] splines_4.3.0 reprex_2.1.0 withr_3.0.0 tools_4.3.0 nloptr_2.0.3 coda_0.19-4
[31] minqa_1.2.6 dplyr_1.1.4 boot_1.3-28.1 assertthat_0.2.1 vctrs_0.6.5 R6_2.5.1
[37] zoo_1.8-12 proxy_0.4-27 lifecycle_1.0.4 emmeans_1.8.5 classInt_0.4-10 fs_1.6.3
[43] MASS_7.3-60 pkgconfig_2.0.3 pillar_1.9.0 glue_1.7.0 Rcpp_1.0.12 sf_1.0-15
[49] tibble_3.2.1 tidyselect_1.2.1 rstudioapi_0.15.0 xtable_1.8-4 nlme_3.1-162 TMB_1.9.10
[55] compiler_4.3.0 sp_2.1-3

seananderson commented 4 months ago

Nothing specific to do with Windows. This should fix it:

library(sdmTMB)
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 20)
pcod$year_f <- as.factor(pcod$year)
fit <- sdmTMB(
  density ~ 0 + year_f,
  data = pcod,
  mesh = mesh,
  time = "year",
  family = tweedie(link = "log"),
  spatiotemporal = "ar1",
  extra_time = c(2006, 2008, 2010, 2012, 2014, 2016)
)
fit
#> Spatiotemporal model fit by ML ['sdmTMB']
#> Formula: density ~ 0 + year_f
#> Mesh: mesh (isotropic covariance)
#> Time column: year
#> Data: pcod
#> Family: tweedie(link = 'log')
#>  
#>            coef.est coef.se
#> year_f2003     2.74    0.37
#> year_f2004     3.01    0.36
#> year_f2005     3.04    0.36
#> year_f2007     1.95    0.37
#> year_f2009     2.30    0.37
#> year_f2011     2.87    0.35
#> year_f2013     2.99    0.35
#> year_f2015     3.01    0.36
#> year_f2017     2.35    0.37
#> 
#> Dispersion parameter: 13.67
#> Tweedie p: 1.56
#> Spatiotemporal AR1 correlation (rho): -0.63
#> Matérn range: 3.78
#> Spatial SD: 11.17
#> Spatiotemporal marginal AR1 SD: 7.94
#> ML criterion at convergence: 6519.955
#> 
#> See ?tidy.sdmTMB to extract these values as a data frame.

Created on 2024-03-12 with reprex v2.1.0

The issue is that the as.factor(year) uses the same column as the year argument. The extra_time argument creates the extra time slices and then those get added as factor levels for which there are no data to inform them. The solution is to not use the same column for something with as.factor() as the time column when extra_time is used. I'm not sure what the best way is to defend against this. One option would be to try to detect it and issue a warning or error. Another option would be to rename the time column internally so that the two will never conflict... maybe that option is best.

Lewis-Barnett-NOAA commented 4 months ago

Ah, that is the workaround, thanks! I concur that renaming the time column internally to prevent this sounds like a good idea.