pbs-assess / sdmTMB

:earth_americas: An R package for spatial and spatiotemporal GLMMs with TMB
https://pbs-assess.github.io/sdmTMB/
184 stars 26 forks source link

sdmTMB_simulate issue for very coarse mesh #370

Closed fhui28 closed 1 week ago

fhui28 commented 1 week ago

When the mesh is created with a sufficiently small number of knots/large cutoff so the mesh is coarse, sdmTMB_simulate will stop generating a spatial field as part of the data generation process; please see below taken from the example in the help file.

I suspect the issue is that the mesh is not great (although the model still actually fits?), but regardless think it would be good in the least to produce a warning somewhere about this.

Thanks heaps sdmTMB team!

set.seed(123)

Make fake predictor(s) (a1) and sampling locations:

predictor_dat <- data.frame(
    X = runif(300), Y = runif(300),
    a1 = rnorm(300), year = rep(1:6, each = 50))

When n_knots goes below certain number (in this case below 24 knots), no spatial field will be generated as part of the simulation process. The issue also arises, as one would expect, if cutoffs is greater than a certain amount

mesh <- sdmTMB::make_mesh(predictor_dat, xy_cols = c("X", "Y"), n_knots = 20) 
#mesh <- sdmTMB::make_mesh(predictor_dat, xy_cols = c("X", "Y"), cutoff = 0.2) 

Simulate responses

sim_dat <- sdmTMB::sdmTMB_simulate(
    formula = ~ 1 + a1,
    data = predictor_dat,
    time = "year",
    mesh = mesh,
    family = gaussian(),
    range = 0.5,
    sigma_E = 0.1,
    phi = 0.1,
    sigma_O = 0.2,
    seed = 42,
    B = c(0.2, -0.4) # B0 = intercept, B1 = a1 slope
)

head(sim_dat)
#>   year         X           Y         mu        eta   observed (Intercept)
#> 1    1 0.2875775 0.784575267  0.2201940  0.2201940  0.2962182           1
#> 2    1 0.7883051 0.009429905  0.4389662  0.4389662  0.4428653           1
#> 3    1 0.4089769 0.779065883  0.4672202  0.4672202  0.5407274           1
#> 4    1 0.8830174 0.729390652  0.7253233  0.7253233  0.7106760           1
#> 5    1 0.9404673 0.630131853  0.5214041  0.5214041  0.5156154           1
#> 6    1 0.0455565 0.480910830 -0.7034962 -0.7034962 -0.6552593           1
#>           a1
#> 1 -0.7152422
#> 2 -0.7526890
#> 3 -0.9385387
#> 4 -1.0525133
#> 5 -0.4371595
#> 6  0.3311792

Note the model actually fits, though not sure if spatial parameter estimates make sense here or not

dofit <- sdmTMB::sdmTMB(observed ~ a1, 
                        data = sim_dat, 
                        family = gaussian(),
                        spatial = TRUE,
                        mesh = mesh)    

summary(dofit)
#> Spatial model fit by ML ['sdmTMB']
#> Formula: observed ~ a1
#> Mesh: mesh (isotropic covariance)
#> Data: sim_dat
#> Family: gaussian(link = 'identity')
#>  
#>             coef.est coef.se
#> (Intercept)     0.11    0.11
#> a1             -0.40    0.01
#> 
#> Dispersion parameter: 0.13
#> Matérn range: 0.50
#> Spatial SD: 0.25
#> ML criterion at convergence: -132.834
#> 
#> See ?tidy.sdmTMB to extract these values as a data frame.

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Linux Mint 21.1
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en_AU:en
#>  collate  en_AU.UTF-8
#>  ctype    en_AU.UTF-8
#>  tz       Australia/Sydney
#>  date     2024-09-13
#>  pandoc   3.1.11 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version  date (UTC) lib source
#>  assertthat     0.2.1    2019-03-21 [1] CRAN (R 4.1.2)
#>  boot           1.3-28   2021-05-03 [4] CRAN (R 4.1.1)
#>  class          7.3-20   2022-01-13 [4] CRAN (R 4.1.2)
#>  classInt       0.4-10   2023-09-05 [1] CRAN (R 4.1.2)
#>  cli            3.6.3    2024-06-21 [1] CRAN (R 4.1.2)
#>  coda           0.19-4.1 2024-01-31 [1] CRAN (R 4.1.2)
#>  DBI            1.2.2    2024-02-16 [1] CRAN (R 4.1.2)
#>  digest         0.6.35   2024-03-11 [1] CRAN (R 4.1.2)
#>  e1071          1.7-14   2023-12-06 [1] CRAN (R 4.1.2)
#>  emmeans        1.10.4   2024-08-21 [1] CRAN (R 4.1.2)
#>  estimability   1.5.1    2024-05-12 [1] CRAN (R 4.1.2)
#>  evaluate       0.23     2023-11-01 [1] CRAN (R 4.1.2)
#>  fansi          1.0.6    2023-12-08 [1] CRAN (R 4.1.2)
#>  fastmap        1.1.1    2023-02-24 [1] CRAN (R 4.1.2)
#>  fmesher        0.1.7    2024-07-01 [1] CRAN (R 4.1.2)
#>  fs             1.6.4    2024-04-25 [1] CRAN (R 4.1.2)
#>  generics       0.1.3    2022-07-05 [1] CRAN (R 4.1.2)
#>  glue           1.7.0    2024-01-09 [1] CRAN (R 4.1.2)
#>  htmltools      0.5.7    2023-11-03 [1] CRAN (R 4.1.2)
#>  KernSmooth     2.23-20  2021-05-03 [4] CRAN (R 4.0.4)
#>  knitr          1.46     2024-04-06 [1] CRAN (R 4.1.2)
#>  lattice        0.20-45  2021-09-22 [4] CRAN (R 4.1.1)
#>  lifecycle      1.0.4    2023-11-07 [1] CRAN (R 4.1.2)
#>  lme4           1.1-35.5 2024-07-03 [1] CRAN (R 4.1.2)
#>  magrittr       2.0.3    2022-03-30 [1] CRAN (R 4.1.2)
#>  MASS           7.3-55   2022-01-13 [4] CRAN (R 4.1.2)
#>  Matrix         1.6-5    2024-01-11 [1] CRAN (R 4.1.2)
#>  mgcv           1.9-1    2023-12-21 [1] CRAN (R 4.1.2)
#>  minqa          1.2.8    2024-08-17 [1] CRAN (R 4.1.2)
#>  mvtnorm        1.2-6    2024-08-17 [1] CRAN (R 4.1.2)
#>  nlme           3.1-155  2022-01-13 [4] CRAN (R 4.1.2)
#>  nloptr         2.1.1    2024-06-25 [1] CRAN (R 4.1.2)
#>  pillar         1.9.0    2023-03-22 [1] CRAN (R 4.1.2)
#>  pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 4.1.2)
#>  proxy          0.4-27   2022-06-09 [1] CRAN (R 4.1.2)
#>  Rcpp           1.0.13   2024-07-17 [1] CRAN (R 4.1.2)
#>  reprex         2.1.1    2024-07-06 [1] CRAN (R 4.1.2)
#>  rlang          1.1.4    2024-06-04 [1] CRAN (R 4.1.2)
#>  rmarkdown      2.26     2024-03-05 [1] CRAN (R 4.1.2)
#>  rstudioapi     0.16.0   2024-03-24 [1] CRAN (R 4.1.2)
#>  sdmTMB         0.6.0    2024-05-30 [1] CRAN (R 4.1.2)
#>  sessioninfo    1.2.2    2021-12-06 [1] CRAN (R 4.1.2)
#>  sf             1.0-16   2024-03-24 [1] CRAN (R 4.1.2)
#>  sp             2.1-4    2024-04-30 [1] CRAN (R 4.1.2)
#>  tibble         3.2.1    2023-03-20 [1] CRAN (R 4.1.2)
#>  TMB            1.9.14   2024-07-03 [1] CRAN (R 4.1.2)
#>  units          0.8-5    2023-11-28 [1] CRAN (R 4.1.2)
#>  utf8           1.2.4    2023-10-22 [1] CRAN (R 4.1.2)
#>  vctrs          0.6.5    2023-12-01 [1] CRAN (R 4.1.2)
#>  withr          3.0.1    2024-07-31 [1] CRAN (R 4.1.2)
#>  xfun           0.44     2024-05-15 [1] CRAN (R 4.1.2)
#>  xtable         1.8-4    2019-04-21 [1] CRAN (R 4.1.2)
#>  yaml           2.3.8    2023-12-11 [1] CRAN (R 4.1.2)
#> 
#>  [1] /home/fkch/R/x86_64-pc-linux-gnu-library/4.1
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2024-09-13 with reprex v2.1.1

ericward-noaa commented 1 week ago

Thanks for pointing this out -- I don't think there's actually a problem here and will try to explain. It's probably easier to think about this when using the cutoff argument instead of n_knots. First, the simulated data has a really high spatial range (0.5) so you would expect the spatial field to be pretty smooth (as the domain of X and Y are 0, 1). Second, when constructing these meshes, we generally want the cutoff distance to be smaller than the spatial range, so that we can capture finer scale spatial variability. In this example, the cutoff distance is greater than the spatial range, and the spatial random effects in the simulated object are all estimated at 0 (so the fields are not reported). I tried reducing range = 0.2 -- a cutoff distance of 0.2 will not generate omega_s in the simulated dataframe, but all other cutoff values < 0.2 will.

fhui28 commented 1 week ago

Thanks for explaining Eric,

Your explanation makes sense and I don't see a problem methodologically. But I nevertheless think a warning should be put in place for the simulate function, especially given you can fit this coarse mesh model but then cannot properly simulate off the fitted object.

A better option might be to offer guidelines and warnings from make_mesh?

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Eric Ward @.> Sent: Friday, September 13, 2024 10:47:58 PM To: pbs-assess/sdmTMB @.> Cc: Francis KC Hui @.>; Author @.> Subject: Re: [pbs-assess/sdmTMB] sdmTMB_simulate issue for very coarse mesh (Issue #370)

Thanks for pointing this out -- I don't think there's actually a problem here and will try to explain. It's probably easier to think about this when using the cutoff argument instead of n_knots. First, the simulated data has a really high spatial range (0.5) so you would expect the spatial field to be pretty smooth (as the domain of X and Y are 0, 1). Second, when constructing these meshes, we generally want the cutoff distance to be smaller than the spatial range, so that we can capture finer scale spatial variability. In this example, the cutoff distance is greater than the spatial range, and the spatial random effects in the simulated object are all estimated at 0 (so the fields are not reported). I tried reducing range = 0.2 -- a cutoff distance of 0.2 will not generate omega_s in the simulated dataframe, but all other cutoff values < 0.2 will.

— Reply to this email directly, view it on GitHubhttps://github.com/pbs-assess/sdmTMB/issues/370#issuecomment-2348879081, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABH6DBBZWSQI56MFQDGZERDZWLNHPAVCNFSM6AAAAABOELK76GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBYHA3TSMBYGE. You are receiving this because you authored the thread.Message ID: @.***>

ericward-noaa commented 1 week ago

I think some warnings like this should fix it: https://github.com/pbs-assess/sdmTMB/pull/371/commits/1418f42a2b905f1fcd9febf235103133161a5b2b

seananderson commented 1 week ago

I think it's more insidious than that. I had messed up the logic such that those random field columns were getting dropped if any field values were exactly zero. I had meant for the logic to be that if all the field values were zero, drop them. I changed the logic such that they are included only if the input sigma_O, sigma_E, etc. are > 0. Thankfully this did not affect the 'observed', 'mu', or 'eta', columns—it just meant the field columns were mysteriously missing sometimes. Thanks for pointing this out @fhui28! It should be fixed now and I've added a unit test to cover this. https://github.com/pbs-assess/sdmTMB/commit/cb83a62207a2d62a4e1f7a8b2fa02d1535a2e703

  library(sdmTMB)
  set.seed(123)
  predictor_dat <- data.frame(
    X = runif(100), Y = runif(100),
    a1 = rnorm(100), year = rep(1:2, each = 50))
  mesh <- sdmTMB::make_mesh(predictor_dat, xy_cols = c("X", "Y"), n_knots = 30)
  sim_dat <- sdmTMB::sdmTMB_simulate(
    formula = ~ 1 + a1,
    data = predictor_dat,
    time = "year",
    mesh = mesh,
    family = gaussian(),
    range = 0.5,
    sigma_E = 0.1,
    phi = 0.1,
    sigma_O = 0.2,
    seed = 42,
    B = c(0.2, -0.4) # B0 = intercept, B1 = a1 slope
  )
  head(sim_dat, n = 3)
#>   year         X         Y     omega_s   epsilon_st          mu         eta
#> 1    1 0.2875775 0.5999890 -0.08521738  0.015363059  0.41430830  0.41430830
#> 2    1 0.7883051 0.3328235 -0.28367725 -0.001568776 -0.18799951 -0.18799951
#> 3    1 0.4089769 0.4886130 -0.44112493  0.072181557 -0.07026663 -0.07026663
#>     observed (Intercept)         a1
#> 1  0.4135321           1 -0.7104066
#> 2 -0.2680277           1  0.2568837
#> 3 -0.1236159           1 -0.2466919

Created on 2024-09-13 with reprex v2.1.1