pfmc-assessments / indexwc

Estimate indices of abundance for west coast fish species
2 stars 2 forks source link

[BUG]: Error in calculating index when using AR1 with extra_time #28

Open chantelwetzel-noaa opened 8 months ago

chantelwetzel-noaa commented 8 months ago

Is there an existing issue for this?

Current Behavior

When exploring using AR1 sdmTMB printed the suggestion of adding the missing data year (extra_time = c(2020)) but when this added to the function call:

best <- data %>% dplyr::mutate(

Evaluate the call in family

family = purrr::map(family, .f = ~ eval(parse(text = .x))),
# Run the model on each row in data
results = purrr::pmap(
  .l = list(
    data = data_filtered,
    formula = formula,
    family = family,
    anisotropy = anisotropy,
    n_knots = knots,
    spatiotemporal = purrr::map2(spatiotemporal1, spatiotemporal2, list),
    extra_time = c(2020)
  ),
  .f = indexwc::run_sdmtmb
)

)

Adding extra_time to the function does run a model using AR1. but the code errors out when calculating the index due to the year 2020 not being included in the grid (I think):

Error in dplyr::mutate(): ? In argument: results = purrr::pmap(...). Caused by error in purrr::pmap(): ? In index: 1. Caused by error in purrr::map() at indexwc/R/calc_index_areas.R:71:3: ? In index: 1. Caused by error in get_generic(): ! area should be of the same length as nrow(newdata) or of length 1. Run rlang::last_trace() to see where the error occurred.

rlang::last_trace() <error/dplyr:::mutate_error> Error in dplyr::mutate(): ? In argument: results = purrr::pmap(...). Caused by error in purrr::pmap(): ? In index: 1. Caused by error in purrr::map() at indexwc/R/calc_index_areas.R:71:3: ? In index: 1.

Expected Behavior

No response

Steps To Reproduce

No response

Environment

- OS:
- Node:
- npm:

Anything else?

No response

ericward-noaa commented 8 months ago

Can you help me understand what the formula, etc are here?

Starting with the example in the manual, this works fine with no extra time,

pcod_spde <- make_mesh(pcod, c("X", "Y"), n_knots = 60, type = "kmeans")
m <- sdmTMB(
  data = pcod,
  formula = density ~ 0 + as.factor(year),
  time = "year", 
  mesh = pcod_spde, 
  family = tweedie(link = "log")
)
# make prediction grid:
nd <- replicate_df(qcs_grid, "year", unique(pcod$year))
# Note `return_tmb_object = TRUE` and the prediction grid:
predictions <- predict(m, newdata = nd, return_tmb_object = TRUE)
ind <- get_index(predictions)

If you add extra_time when year is a factor, it will error out. For example this won't work, because you can't estimate the year effect for 2020 update(m, extra_time = c(2020))

You can fix this by replacing fixed effects in year with random effects, like a time-varying random process. This works but obviously isn't a good model because of lots of other missing years (2020 is treated as the last year in the sequence after 2015, 2017, ...)

update(m, extra_time = c(2020),
       formula = density ~ 0,
       time_varying = ~1)

With spatiotemporal fields, you want them to probably be rw or ar1 if you have missing years -- 2020 here.

Like in your comment, you'll also need to add 2020 to the grid if you haven't already

seananderson commented 8 months ago

It should be OK if the grid is missing years. In fact, you could predict on one year at a time to save memory if needed. Is it possible this is the issue? Otherwise, can you post what the call to sdmTMB ends up being?

Also, if you're using extra time, make sure you're on the latest version (CRAN or GitHub), because there was an important bug in how that was handled for a while. Oh, and I just sped up index calculation a fair bit on the GitHub version, so I'd work from that version until the next CRAN release.

chantelwetzel-noaa commented 8 months ago

Mental note to myself - both Eric and Sean are lurking here and are great at chiming in on issues.

This issue arose after a discussion between @kellijohnson-NOAA and myself yesterday. We are running models using a configuration file for each species. There are a couple of species (yelloweye rockfish) where I have been unable to get positive definite hessian with a delta model due to limited data where I have turned off the spatiotemporal model component for each of the two models. We thought it would be useful to try including an AR1 process in the model for these species. I originally encountered this issue in a model I ran overnight since including the AR1 process slows down the model run. In my attempts to work through the issue. I have lost the output from the model fitting (bad move on my part). I am rerunning a model with AR1 for the presence/absence model component without passing the extra_time component to the model to see if this resolves the issue. Once that finishes running I can update this issue.

Here is the larger script that Kelli and I have been working from when running models:

species_data <- nwfscSurvey::pull_catch(common_name = "yelloweye rockfish", survey = "NWFSC.Combo")
configuration <- tibble::as_tibble(read.csv(
  file.path("data-raw", "configuration.csv")
))

configuration <- configuration[which(configuration$species == "yelloweye rockfish"),  ]

data <-configuration %>%
  # Row by row ... do stuff then ungroup
  dplyr::rowwise() %>%
  # Pull the data based on the function found in fxn column
  dplyr::mutate(
    data_raw = list(format_data(data = species_data)),
    data_filtered = list(data_raw %>%
                           dplyr::filter(
                             depth <= min_depth, depth >= max_depth,
                             latitude >= min_latitude, latitude <= max_latitude,
                             year >= min_year, year <= max_year
                           ))
  ) %>%
  dplyr::ungroup()

best <- data %>%
  dplyr::mutate(
    # Evaluate the call in family
    family = purrr::map(family, .f = ~ eval(parse(text = .x))),
    # Run the model on each row in data
    results = purrr::pmap(
      .l = list(
        data = data_filtered,
        formula = formula,
        family = family,
        anisotropy = anisotropy,
        n_knots = knots,
        spatiotemporal = purrr::map2(spatiotemporal1, spatiotemporal2, list),
        extra_time = c(2020)
      ),
      .f = indexwc::run_sdmtmb
    )
  )
ericward-noaa commented 8 months ago

I'm confused still about what the AR1 process is being included in -- are you putting it in spatiotemporal effects?

I grabbed the code, and the configuration for this example is using a formula that is

catch_weight ~ 0 + fyear + pass_scaled

You also have 'extra_time' = 2020 -- so I think this is the same as the model in my comment that doesn't work.

The spatial only model with AR1 year effects works fine and passes checks,

mesh <- make_mesh(data$data_filtered[[1]], xy_cols = c("x","y"), n_knots = 200)
fit_s <- 
  sdmTMB(data = data$data_filtered[[1]],
         formula = catch_weight ~ 0 + pass_scaled,
         time_varying = ~ 1,
         mesh = mesh,
         family = delta_gamma(),
         anisotropy = TRUE,
         time = "year",
         spatiotemporal = "off",
         extra_time = c(2020))
seananderson commented 8 months ago

Just for clarity, that model above has a random walk on the intercept (here year effects). It needs time_varying_type = 'ar1' to switch from the default to AR1.

I assume here you're taking about spatiotemporal AR1 fields? Indeed, that results in a less sparse matrix of random effects (the random effects in a given year become dependent on the previous and next year) and therefore is slower to fit.

ericward-noaa commented 8 months ago

Ah yeah, thanks for pointing that out @seananderson

@chantelwetzel-noaa I was able to get the spatiotemporal version of the above model to converge with an extra newton step,

fit_st <- update(fit_s, spatiotemporal="ar1")
fit_st <- run_extra_optimization(fit_st, newton_loops = 1)

So as long as you turn off fixed year effects, I think either the spatial only or spatiotemporal models would work

kellijohnson-NOAA commented 8 months ago

Thanks @ericward-noaa and @seananderson. I am wondering if we turn off fixed effects for years because what I had told Chantel to do was use ar1 for the spatiotemporal component, then can you still use that output for an index of abundance. I remember somewhere there was guidance that fixed effects are import for years if using the model results for an index that will be used for assessment purposes. Feel free to just point me to something to read if you don't want to explain it again here. And, sorry @chantelwetzel-noaa that I got you into this mess.

ericward-noaa commented 8 months ago

In general, yes -- with no missing data, fixed year effects will give the best unbiased estimates of a trend. If you don't care about estimating the index in 2020, then fixed effects is fine (and turn extra_time off). But if you do want to estimate the mean density in 2020, you need to have some process to fill in the gap. This could be using a time_varying formula like my examples, a smooth over year, e.g. s(year), or just estimating a global intercept with no year effects. This is all new territory for WCBTS -- so no worries

Related / unrelated, @seananderson has been tinkering with using time varying year and spatiotemporal effects for the DFO Synoptic surveys. They had a survey in 2020, but the reason there is the spatial patterning of the survey (it's a checkerboard with the same 2 of 4 cells sampled every other year)