Open chantelwetzel-noaa opened 8 months ago
Can you help me understand what the formula, etc are here?
Starting with the example in the manual, this works fine with no extra time,
pcod_spde <- make_mesh(pcod, c("X", "Y"), n_knots = 60, type = "kmeans")
m <- sdmTMB(
data = pcod,
formula = density ~ 0 + as.factor(year),
time = "year",
mesh = pcod_spde,
family = tweedie(link = "log")
)
# make prediction grid:
nd <- replicate_df(qcs_grid, "year", unique(pcod$year))
# Note `return_tmb_object = TRUE` and the prediction grid:
predictions <- predict(m, newdata = nd, return_tmb_object = TRUE)
ind <- get_index(predictions)
If you add extra_time
when year is a factor, it will error out. For example this won't work, because you can't estimate the year effect for 2020
update(m, extra_time = c(2020))
You can fix this by replacing fixed effects in year with random effects, like a time-varying random process. This works but obviously isn't a good model because of lots of other missing years (2020 is treated as the last year in the sequence after 2015, 2017, ...)
update(m, extra_time = c(2020),
formula = density ~ 0,
time_varying = ~1)
With spatiotemporal fields, you want them to probably be rw
or ar1
if you have missing years -- 2020 here.
Like in your comment, you'll also need to add 2020 to the grid if you haven't already
It should be OK if the grid is missing years. In fact, you could predict on one year at a time to save memory if needed. Is it possible this is the issue? Otherwise, can you post what the call to sdmTMB ends up being?
Also, if you're using extra time, make sure you're on the latest version (CRAN or GitHub), because there was an important bug in how that was handled for a while. Oh, and I just sped up index calculation a fair bit on the GitHub version, so I'd work from that version until the next CRAN release.
Mental note to myself - both Eric and Sean are lurking here and are great at chiming in on issues.
This issue arose after a discussion between @kellijohnson-NOAA and myself yesterday. We are running models using a configuration file for each species. There are a couple of species (yelloweye rockfish) where I have been unable to get positive definite hessian with a delta model due to limited data where I have turned off the spatiotemporal model component for each of the two models. We thought it would be useful to try including an AR1 process in the model for these species. I originally encountered this issue in a model I ran overnight since including the AR1 process slows down the model run. In my attempts to work through the issue. I have lost the output from the model fitting (bad move on my part). I am rerunning a model with AR1 for the presence/absence model component without passing the extra_time component to the model to see if this resolves the issue. Once that finishes running I can update this issue.
Here is the larger script that Kelli and I have been working from when running models:
species_data <- nwfscSurvey::pull_catch(common_name = "yelloweye rockfish", survey = "NWFSC.Combo")
configuration <- tibble::as_tibble(read.csv(
file.path("data-raw", "configuration.csv")
))
configuration <- configuration[which(configuration$species == "yelloweye rockfish"), ]
data <-configuration %>%
# Row by row ... do stuff then ungroup
dplyr::rowwise() %>%
# Pull the data based on the function found in fxn column
dplyr::mutate(
data_raw = list(format_data(data = species_data)),
data_filtered = list(data_raw %>%
dplyr::filter(
depth <= min_depth, depth >= max_depth,
latitude >= min_latitude, latitude <= max_latitude,
year >= min_year, year <= max_year
))
) %>%
dplyr::ungroup()
best <- data %>%
dplyr::mutate(
# Evaluate the call in family
family = purrr::map(family, .f = ~ eval(parse(text = .x))),
# Run the model on each row in data
results = purrr::pmap(
.l = list(
data = data_filtered,
formula = formula,
family = family,
anisotropy = anisotropy,
n_knots = knots,
spatiotemporal = purrr::map2(spatiotemporal1, spatiotemporal2, list),
extra_time = c(2020)
),
.f = indexwc::run_sdmtmb
)
)
I'm confused still about what the AR1 process is being included in -- are you putting it in spatiotemporal effects?
I grabbed the code, and the configuration for this example is using a formula that is
catch_weight ~ 0 + fyear + pass_scaled
You also have 'extra_time' = 2020
-- so I think this is the same as the model in my comment that doesn't work.
The spatial only model with AR1 year effects works fine and passes checks,
mesh <- make_mesh(data$data_filtered[[1]], xy_cols = c("x","y"), n_knots = 200)
fit_s <-
sdmTMB(data = data$data_filtered[[1]],
formula = catch_weight ~ 0 + pass_scaled,
time_varying = ~ 1,
mesh = mesh,
family = delta_gamma(),
anisotropy = TRUE,
time = "year",
spatiotemporal = "off",
extra_time = c(2020))
Just for clarity, that model above has a random walk on the intercept (here year effects). It needs time_varying_type = 'ar1'
to switch from the default to AR1.
I assume here you're taking about spatiotemporal AR1 fields? Indeed, that results in a less sparse matrix of random effects (the random effects in a given year become dependent on the previous and next year) and therefore is slower to fit.
Ah yeah, thanks for pointing that out @seananderson
@chantelwetzel-noaa I was able to get the spatiotemporal version of the above model to converge with an extra newton step,
fit_st <- update(fit_s, spatiotemporal="ar1")
fit_st <- run_extra_optimization(fit_st, newton_loops = 1)
So as long as you turn off fixed year effects, I think either the spatial only or spatiotemporal models would work
Thanks @ericward-noaa and @seananderson. I am wondering if we turn off fixed effects for years because what I had told Chantel to do was use ar1 for the spatiotemporal component, then can you still use that output for an index of abundance. I remember somewhere there was guidance that fixed effects are import for years if using the model results for an index that will be used for assessment purposes. Feel free to just point me to something to read if you don't want to explain it again here. And, sorry @chantelwetzel-noaa that I got you into this mess.
In general, yes -- with no missing data, fixed year effects will give the best unbiased estimates of a trend. If you don't care about estimating the index in 2020, then fixed effects is fine (and turn extra_time
off). But if you do want to estimate the mean density in 2020, you need to have some process to fill in the gap. This could be using a time_varying
formula like my examples, a smooth over year, e.g. s(year)
, or just estimating a global intercept with no year effects. This is all new territory for WCBTS -- so no worries
Related / unrelated, @seananderson has been tinkering with using time varying year and spatiotemporal effects for the DFO Synoptic surveys. They had a survey in 2020, but the reason there is the spatial patterning of the survey (it's a checkerboard with the same 2 of 4 cells sampled every other year)
Is there an existing issue for this?
Current Behavior
When exploring using AR1 sdmTMB printed the suggestion of adding the missing data year (extra_time = c(2020)) but when this added to the function call:
best <- data %>% dplyr::mutate(
Evaluate the call in family
)
Adding extra_time to the function does run a model using AR1. but the code errors out when calculating the index due to the year 2020 not being included in the grid (I think):
Error in
dplyr::mutate()
: ? In argument:results = purrr::pmap(...)
. Caused by error inpurrr::pmap()
: ? In index: 1. Caused by error inpurrr::map()
at indexwc/R/calc_index_areas.R:71:3: ? In index: 1. Caused by error inget_generic()
: !area
should be of the same length asnrow(newdata)
or of length 1. Runrlang::last_trace()
to see where the error occurred.Expected Behavior
No response
Steps To Reproduce
No response
Environment
Anything else?
No response