Predictions with new data from delta models with random intercepts

tom-peatman commented 2 months ago

I've recently started using sdmTMB to model incidental catches in purse seine fisheries in the Pacific Ocean. I've been very impressed with the package, and greatly appreciate all the time and effort that the authors and contributors have put in to it.

I've noticed some behaviour when predicting from delta_gamma and delta_truncated_nbinom2 models with random intercepts that I found unexpected, based on my understanding of the documentation for the various functions.

I'm running sdmTMB version 0.5.0 on R version 4.3.0 on Windows.

Below is an example, fitting a delta_gamma model to the pcod_2011 dataset, with depth smooths and random intercepts for year in both model components. My read of ?sdmTMB is that random intercepts are OK for delta families, as long as they are in both the delta component and and positive component. And it appears that random intercepts are estimated for both model components, as recovered by the tidy calls.

When setting re_form_id = NULL in the predict call with new data, the predictions for the delta component are conditional on the fitted random intercepts for year, which is as I would expect. However, the predictions for the gamma component do not appear to be conditional on the fitted random intercepts for year.

library(sdmTMB)

mesh <- make_mesh(pcod_2011, c("X", "Y"), cutoff = 20)
pcod_2011$year <- factor(pcod_2011$year)

## fit model with random intercepts for year in both model components
fit <- sdmTMB(
  density ~ s(depth) + (1 | year),
  data = pcod_2011, mesh = mesh,
  family = delta_gamma(link1 = "logit", link2 = "log")
)

## fitted random intercepts
tidy(fit, effects = "ran_vals", model = 1)
tidy(fit, effects = "ran_vals", model = 2)

## data frame to get predictions for each year (at a specific location and depth)
new_dat <- expand.grid(X = mean(pcod_2011$X), Y = mean(pcod_2011$Y),
                       depth = mean(pcod_2011$depth), year = unique(pcod_2011$year))

predict(fit, newdata = new_dat, re_form = NULL, re_form_iid = NULL)

I see similar behaviour when using using the delta_truncated_nbinom2 family for discrete responses too.

Hopefully this provides enough information - please let me know if not!

seananderson commented 2 months ago

@tom-peatman, thank you for reporting this. I have confirmed there was an indexing issue in reporting the est_non_rf1 and est_non_rf2 columns. Essentially, the .cpp was missing an index for the linear predictor column in a matrix and so both the first and second linear predictors were getting added into the first linear predictor column.

Here: https://github.com/pbs-assess/sdmTMB/blob/47ef042017e83e45626a0a0be6744af872aa2a1d/src/sdmTMB.cpp#L1090 and also here: https://github.com/pbs-assess/sdmTMB/blob/47ef042017e83e45626a0a0be6744af872aa2a1d/src/sdmTMB.cpp#L1102

I had no idea that could silently work without a C++ error. That should be

 proj_fe(i,m) += proj_iid_re_i(i,m);

So, the overall predictions were right because the random effects were getting summed up (which is what unit tests were testing) but those est_non_rf columns were wrong in this case because the random effects from both linear predictors ended up in the first column.

I'll finish running tests locally and push an update and include a new unit test based on your example.

New output from your example:

> p <- predict(fit, newdata = new_dat)
> dplyr::select(p, X, Y, year, est_non_rf1, est_non_rf2)
         X        Y year est_non_rf1 est_non_rf2
1 464.1896 5744.225 2011  0.05784588    4.356461
2 464.1896 5744.225 2013  1.10372074    4.152266
3 464.1896 5744.225 2015  0.83646897    4.283722
4 464.1896 5744.225 2017  0.04876451    4.091861

Thanks again!

tom-peatman commented 2 months ago

@seananderson, thanks for the very quick response, and great to hear that it's an easy fix.

And thanks again for sdmTMB, I've really been enjoying using it!

seananderson commented 2 months ago

Fixed now in the development version.

remotes::install_github("pbs-assess/sdmTMB")

I think we're due for an update on CRAN soon too.

pbs-assess / sdmTMB

Predictions with new data from delta models with random intercepts #342