Closed seanhardison1 closed 1 year ago
I think that's related to this older issue: https://github.com/pbs-assess/sdmTMB/issues/54
Strangely, I can't reproduce the issue. (I did just push a fix for the annoying printing of the sdmTMB_cv()
models where the data and mesh would be printed too.)
library(sdmTMB)
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 25)
fam <- tweedie(link = "log")
m <- sdmTMB(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh = mesh,
family = fam
)
m_cv <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh = mesh,
family = tweedie(link = "log"), k_folds = 2
)
#> Running fits with `future.apply()`.
#> Set a parallel `future::plan()` to use parallel processing.
m_cv <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh = mesh,
family = fam, k_folds = 2
)
#> Running fits with `future.apply()`.
#> Set a parallel `future::plan()` to use parallel processing.
m_cv$models[[1]]
#> Spatial model fit by ML ['sdmTMB']
#> Formula: density ~ 0 + depth_scaled + depth_scaled2
#> Family: tweedie(link = 'log')
#>
#> coef.est coef.se
#> depth_scaled -1.76 0.21
#> depth_scaled2 -1.27 0.10
#>
#> Dispersion parameter: 13.68
#> Tweedie p: 1.60
#> Matern range: 108.62
#> Spatial SD: 2.73
#> ML criterion at convergence: 3177.450
#>
#> See ?tidy.sdmTMB to extract these values as a data frame.
m_cv$models[[2]]
#> Spatial model fit by ML ['sdmTMB']
#> Formula: density ~ 0 + depth_scaled + depth_scaled2
#> Family: tweedie(link = 'log')
#>
#> coef.est coef.se
#> depth_scaled -1.48 0.16
#> depth_scaled2 -1.29 0.11
#>
#> Dispersion parameter: 13.99
#> Tweedie p: 1.61
#> Matern range: 187.00
#> Spatial SD: 2.24
#> ML criterion at convergence: 3329.646
#>
#> See ?tidy.sdmTMB to extract these values as a data frame.
Created on 2022-08-30 with reprex v2.0.2
sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Monterey 12.3.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.14 knitr_1.39 magrittr_2.0.3 R.cache_0.16.0
#> [5] rlang_1.0.4 fastmap_1.1.0 fansi_1.0.3 stringr_1.4.1
#> [9] styler_1.7.0 highr_0.9 tools_4.2.1 xfun_0.32
#> [13] R.oo_1.25.0 utf8_1.2.2 cli_3.3.0 withr_2.5.0
#> [17] htmltools_0.5.3 yaml_2.3.5 digest_0.6.29 tibble_3.1.8
#> [21] lifecycle_1.0.1 purrr_0.3.4 R.utils_2.12.0 vctrs_0.4.1
#> [25] fs_1.5.2 glue_1.6.2 evaluate_0.16 rmarkdown_2.15
#> [29] reprex_2.0.2 stringi_1.7.8 compiler_4.2.1 pillar_1.8.1
#> [33] R.methodsS3_1.8.2 pkgconfig_2.0.3
Created on 2022-08-30 with reprex v2.0.2
Thanks Sean. Okay I think I figured out what's triggering the error. I was able to run the code I posted previously in a fresh R session, but it breaks down if I re-run sdmTMB_cv
after calling plan(multisession)
:
fam <- tweedie(link = "log")
plan(multisession)
m_cv <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh = mesh,
family = fam, k_folds = 2
)
Can you confirm? I don't know enough about how the futures
library works to know what's causing the error.
Not sure if this is helpful, but the issue could be related to fam
being a global variable that needs to be passed explicitly to the future call. This (unfortunately verbose) code will run, but seems to only evaluate sequentially despite plan(multisession)
being called
library(future)
library(sdmTMB)
plan(multisession)
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 25)
fam <- tweedie()
globals <- list(fam = fam,
pcod = pcod,
mesh = mesh,
sdmTMB_cv = sdmTMB_cv)
f <-
future({
sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh = mesh,
family = fam, k_folds = 8
)
}, globals = globals)
m_cv <- value(f)
m_cv
New sessions are opened but the code only evaluates in a single other session:
Compared to the simpler
plan(multisession)
m_cv <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh = mesh,
family = tweedie(link = "log"), k_folds = 5
)
which will execute in k sessions
I'm just seeing this thread, and confirmed the example still isn't working. Looks like the family argument is not being parsed correctly -- and crashes here https://github.com/pbs-assess/sdmTMB/blob/aad58e0a11f93abd987aca68a52d820b87895f56/R/fit.R#L554
Not perfect, but one workaround that I confirmed works is to use use_initial_fit = TRUE
. This just fits the model to the first fold, and uses those estimates as starting values for other folds. The family argument also gets passed correctly. I'll look into why the other approach isn't working
m_cv <- sdmTMB_cv(
density ~ 0 + depth_scaled + depth_scaled2,
data = pcod, mesh = mesh,
family = fam, k_folds = 2,
use_initial_fit = TRUE
)
For some reason this seems to be working for me now, but I've seen other issues with future not detecting globals. I've now added an argument future_globals
in sdmTMB_cv()
that passes a character vector of global objects to the future package if needed: https://github.com/pbs-assess/sdmTMB/blob/95d9f88847dd598ee3c3524ca3ba7f0dba2f0cab/R/cross-val.R#L101-L105
Hi,
I was wondering if anyone had any insights into how I might pass a family variable to
sdmTMB_cv
. It seems to work withsdmTMB()
but fails for the CV version. For example,Thanks!