Advice on Ecov for catchability options

Hi @timjmiller and @brianstock,

I'm trying to explore models with and without a RW and Ecov effect for a catchability. It's a ported ADMB model so has some tweaks to get it to match closer, making sharing a reprex hard, but below is the key code to give a sense of what I'm trying to do. Here's the initial Ecov setup:

## Build ecov list for inputs
ecov <- list(
  label = "Fem30p",
  mean = as.matrix(env.dat[,2]),
  logsigma = 'est_1', # estimate obs sigma, 1 value shared across years
  year = env.dat$year,
  use_obs = matrix(1*na.ind[,1], ncol=1, nrow=dim(env.dat)[1]), 
  lag = 0, 
  process_model = 'rw',
  where = "q",
  indices=list(1), # only affects first index (Shelikof)
  how = 0, # 0 = no effect (but still fit Ecov to compare AIC), 1 = mean
  link_model = "linear")

First issue: my index only covers part of the time series. I first normalized it with (x-mean)/sd, then replaced NA's with the mean (0 by definition). The thought here is that it'll have no polynomial effect for years without data, giving a kind of average expected index. I then use Ecov$use_obs to filter out those missing years (0s) so they don't influence the fit.

First I try to fit a model without any time variation or Ecov effects (baseline model):

## Model 0: constant q1
catchability <- list(re=rep('none',6))
ecov$how <- 0
ecov$where <- 'none'

However, input$random includes 'Ecov_re' so I tried manually turning that off and running. It is not happy (singular LA). For some reason it's trying to estimate Ecov effects (Ecov_obs_logsigma (n=1) and Ecov_process_pars (n=2)) despite my how and where settings above. Why would that be?

Next I want to estimate just a RW on q, without any Ecov process.

## Model 1: RW on q1, no ecov
catchability <- list(re=c('rw',rep('none',5))) ## RW on q1
ecov$how <- 0
ecov$where <- 'q'

I had previously used the catchability input to specify time-variation which used the 'q_re' effects. Now they're 'Ecov_re'. But presumably do the same thing? What is the difference, if any, of specifying a RW through Ecov vs the other input? Why do you specify 'iid' in the vignette example? I'm a bit confused about how these two features interact.

The next model is a version without the RW and just the Ecov effect.

## Model 2: Only Ecov, no RW
catchability <- list(rep('none',6))
ecov$how <- 1
ecov$where <- 'q'

I would think this would have no random effects, and have linear Ecov effects on q[1], but it still has those Ecov_re. Again I tried turning them off but it gets mad.

The last model combines both and isn't shown. It has parameter estimates (subset):

     Ecov_beta Ecov_process_pars Ecov_process_pars Ecov_obs_logsigma 
       0.7405           -0.0084           -0.7280           -0.5015

so it appears to be fitting, and produces a time varying q[1] with reasonable standard errors. However there seems to be no effect of the environmental variable. I tried taking one of the missing years and inputting a 10, which should induce a very abnormal q in that year when the Ecov effect is fitted. But it's not. So somehow I've configured this wrong. I tried looking through the vignette but didnt' see anything stand out. I think I misunderstand the 'where' and 'how' and how those interact with the catchability specification list.

Any advice?

How you've handled the Ecov time-series with some years of missing data sounds right. As long as Ecov$use_obs is 0 for those years then those datapoints shouldn't contribute to the Ecov time series model NLL.

Your baseline model is still fitting the Ecov time-series (so you can compare this to models with the Ecov q effect using AIC) but without an effect on the population (q). input$random has Ecov_re, which are the fit Ecov values by year. ecov$logsigma = 'est_1', so it is estimating Ecov_obs_logsigma (observation SD used in the nll_Ecov_obs). Then the 2 Ecov_process_pars are the RW initial value and logSD. I think the key point is that ecov$how = 0 tells WHAM to fit the Ecov time series but not estimate the parameter linking the Ecov to q (Ecov_beta should be 0 and mapped to NA). To not fit the Ecov data at all, pass ecov = NULL to prepare_wham_input (or simply omit). Compare what happens with your current baseline and using ecov = NULL, hopefully that will confirm the above!

q_re (or M_re) are used for your second model, a RW on q (or M) without Ecov effect. Setting ecov$how = 0 should accomplish this. As above, Ecov_re are the estimated Ecov time-series random effects, but should have no effect on the model (apart from contributing to the NLL) as long as Ecov_beta are 0 and mapped to NA. If you don't want to fit the Ecov data at all, use ecov = NULL.

For model 3 with Ecov-q effect (but no RW on q directly), it is still estimating the Ecov time series model (Ecov_re are the random effects, Ecov_obs_logsigma is estimated because ecov$logsigma = 'est_1', and the 2 Ecov_process_pars are the RW initial value and logSD). But now because ecov$how = 1, it should be estimating Ecov_beta, which links the fit Ecov to q, ie q_y = beta*Ecov_y.

If you want to have an Ecov-q effect but with the Ecov fixed at your input/data values, try setting ecov$logsigma = 0.001 instead of estimating it.

I don't think I would try model 4 with a RW on q directly and an Ecov effect on q. Give models 1-3 a try again and report back. Good luck!

Thanks Brian for covering all this!

The only outstanding source of confusion I can find is regarding the "random walk" vs. "iid" referring to catchability random effects. The catchability$re types refer to the deviations. So if iid is specified, then the deviations are all independent: $f(q_t) = \mu + e_t$, $e_t$ are iid. "ar1" estimates a correlation parameter for the deviations that is bounded (-1,1) for a stationary process. f(q) is a logit transformation of q. This equates to the f(q) itself being AR1. The way to get a (non-stationary) "random walk" would be to fix the correlation parameter at 1 (or the estimated transformation of it at some large value e.g. 1000).

Also, I believe Brian's suggestion for approximately no observation error should be ecov$log_sigma = log(0.001)

When you get the first few models working, Compare the q estimates with the fourth. Without getting my hands on the model it is difficult to see if there is a bug when RE and Ecov are intended to be applied to catchability.

On Wed, May 4, 2022 at 9:43 PM Brian Stock @.***> wrote:

How you've handled the Ecov time-series with some years of missing data sounds right. As long as Ecov$use_obs is 0 for those years then those datapoints shouldn't contribute to the Ecov time series model NLL.

Your baseline model is still fitting the Ecov time-series (so you can compare this to models with the Ecov q effect using AIC) but without an effect on the population (q). input$random has Ecov_re, which are the fit Ecov values by year. ecov$logsigma = 'est_1', so it is estimating Ecov_obs_logsigma (observation SD used in the nll_Ecov_obs). Then the 2 Ecov_process_pars are the RW initial value and logSD. I think the key point is that ecov$how = 0 tells WHAM to fit the Ecov time series but not estimate the parameter linking the Ecov to q (Ecov_beta should be 0 and mapped to NA). To not fit the Ecov data at all, pass ecov = NULL to prepare_wham_input (or simply omit). Compare what happens with your current baseline and using ecov = NULL, hopefully that will confirm the above!

q_re (or M_re) are used for your second model, a RW on q (or M) without Ecov effect. Setting ecov$how = 0 should accomplish this. As above, Ecov_re are the estimated Ecov time-series random effects, but should have no effect on the model (apart from contributing to the NLL) as long as Ecov_beta are 0 and mapped to NA. If you don't want to fit the Ecov data at all, use ecov = NULL.

For model 3 with Ecov-q effect (but no RW on q directly), it is still estimating the Ecov time series model (Ecov_re are the random effects, Ecov_obs_logsigma is estimated because ecov$logsigma = 'est_1', and the 2 Ecov_process_pars are the RW initial value and logSD). But now because ecov$how = 1, it should be estimating Ecov_beta, which links the fit Ecov to q, ie q_y = beta*Ecov_y.

If you want to have an Ecov-q effect but with the Ecov fixed at your input/data values, try setting ecov$logsigma = 0.001 instead of estimating it.

I don't think I would try model 4 with a RW on q directly and an Ecov effect on q. Give models 1-3 a try again and report back. Good luck!

— Reply to this email directly, view it on GitHub https://github.com/timjmiller/wham/issues/57#issuecomment-1118087640, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIGN7BQ2JAHHEFF5BI3LP3VIMRSZANCNFSM5VDPL7EQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Timothy J. Miller, PhD (he, him, his) Research Fishery Biologist NOAA, Northeast Fisheries Science Center Woods Hole, MA 508-495-2365

Hi guys thanks for writing that out. I think my fundamental confusion here is what the 'Ecov_re' parameters represent. Using the equations given for M in the vignette as motivation, if I wanted to include both effects it would be

logit(q(y))=log_mu+beta1*E(y)+delta(y)

where in my case the delta are RE constrained as a RW.

So I'd think WHAM would set beta1=0 and map off to turn off the Ecov effect completely.
And set delta=0 and map off it off along with process parameters to turn off the time-variation

That's what I infer from the equations on M. But clearly you've got some extra level of fitting going on. If beta1=0, what are you fitting to? Why is there a new NLL component for Ecov? I suppose I should just dig through the source code, but I'm missing something big here and it's not apparent from the vignettes. Are you smoothing the E(y) values as a time series?. Any clarifications would be great. Thanks!

Are you smoothing the E(y) values as a time series?

Yes, exactly. In your equation, the E(y) are fit Ecov values, not data. In the code this is Ecov_x, and in the description paper it's X_y. Ecov_x is a derived quantity in the code. Ecov_re are the random effect deviations, which is why you see it in input$random. If modeling the Ecov as an AR1, line 374 of wham_v0.cpp has Ecov_x(y,i) = Ecov_mu + Ecov_re(y,i). This is the Ecov process model, adds to nll_Ecov.

The Ecov observation model adds to nll_Ecov_obs. The Ecov data (x_y in description paper) are assumed normally distributed around X_y (Ecov_x). So hopefully now my earlier comment makes more sense, that you can have an Ecov-q effect but with the Ecov fixed at your input/data values by setting ecov$log_sigma = log(0.001), which is the Ecov obs error.

Relevant section of the paper is 2.1.1.4.

@brianstock thanks that makes total sense. I came in with some assumptions and didn't see the obvious in front of me. It's so clear that is what WHAM must be doing. I looked through the vignettes and help files but didn't see these links. I'd recommend adding that brief overview and section reference to the help file for prepare_wham_inputs.

I got the models running quickly after making that connection. This probably isn't the best place to run through specific results but yes it's working smoothly for me. Thanks!

The latent covariate process and observations are a separate linear gaussian state-space model.

On Sat, May 7, 2022 at 2:07 AM Brian Stock @.***> wrote:

Are you smoothing the E(y) values as a time series?

Yes, exactly. In your equation, the E(y) are fit Ecov values, not data. In the code this is Ecov_x, and in the description paper https://doi.org/10.1016/j.fishres.2021.105967 it's X_y. Ecov_x is a derived quantity in the code. Ecov_re are the random effect deviations, which is why you see it in input$random. If modeling the Ecov as an AR1, line 374 https://github.com/timjmiller/wham/blob/master/src/wham_v0.cpp#L374 of wham_v0.cpp has Ecov_x(y,i) = Ecov_mu + Ecov_re(y,i). This is the Ecov process model, adds to nll_Ecov.

The Ecov observation model adds to nll_Ecov_obs. The Ecov data (x_y in description paper) are assumed normally distributed around X_y (Ecov_x).

Relevant section of the paper is 2.1.1.4.

— Reply to this email directly, view it on GitHub https://github.com/timjmiller/wham/issues/57#issuecomment-1120143129, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIGN7BCO7OR26HUNU46XS3VIYCDDANCNFSM5VDPL7EQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Timothy J. Miller, PhD (he, him, his) Research Fishery Biologist NOAA, Northeast Fisheries Science Center Woods Hole, MA 508-495-2365

timjmiller / wham

Advice on Ecov for catchability options #57