[Feature]: Best practice for triennial survey index standardization

Describe the problem your feature request is related to.

The triennial survey has been standardized and modeled in various ways:

single time series
early (1980–1992, 55-366m, north of 36.5) vs. late (1995–2004, 55-500m, north of 34.5), justified by change in timing as well as change in survey extent
shallow (55-366m from 1980-2004) vs. deep (366-500m from 1995-2004), justified for Shortspine Thornyhead because their relatively sedentary lifestyle is assumed less impacted by timing

Agreeing on a common approach to be used by {indexwc} would streamline the assessment process, avoid unnecessary debate during reviews, and likely have little impact on any assessment model.

Describe the solution you'd like

@chantelwetzel-noaa (who survived lots of triennial debate during the POP assessment) suggested that the most streamlined and standardized approach would be to have independent early and late sdmTMB models which would be associated with a single fleet in the assessment models. That fleet could include a block on catchability and/or a block on selectivity as appropriate. Species which primarily occur shallower than 366m and north of 36.5° seem less likely to need a block on selectivity, especially if the data are truncated to only include the depths and latitudes common to all survey years.

That suggestion makes sense to me and I suspect it wouldn't significantly impact most models thanks to the much longer and typically more influential WCGBT Survey.

Tagging others in case they have input on best practices for the triennial: @aaronmberger-nwfsc, @andi-stephens-NOAA, @brianlangseth-NOAA, @EJDick-NOAA, @gertsevv, @John-R-Wallace-NOAA, @melissamonk-NOAA, @okenk, @shcaba.

Describe alternatives you have considered

Maintain the status-quo free-for-all by supporting each STAT requesting or standardizing using their preferred approach.

Additional context

The Unofficial Assessment Handbook description of this stuff is under https://pfmc-assessments.github.io/pfmc_assessment_handbook/01-data-sources.html#summary-of-noaa-fishery-independent-trawl-surveys-used-for-west-coast-assessments

Why run two models if it is entered as one fleet, can't we just implement a fixed effect for "era" or something along those lines that would basically allow the index to have some differences or are we assuming that the spatiotemporal random fields would be different between the two eras? @ericward-noaa do you have any thoughts? BTW I am all for a different approach than what I took before which was running about 8 different options.

@kellijohnson-NOAA, good point. If one model can include an adequate split (as proposed by @ericward-noaa for spatial splits in https://github.com/kellijohnson-NOAA/indexwc/issues/6#issuecomment-1293717261), then that seems even better to me.

I am fine with a consistent approach rather than @kellijohnson-NOAA dealing with random preferences of each STAT for a survey that carries relatively little weight in the assessment.

As for the one vs two models, What is the implication of having knots in those large areas that were unsampled by the early triennial? I guess you generally have a boundary region that is not actually sampled, so this would just be a super big boundary region? i.e., you are estimating extra unnecessary REs, but not actually biasing results? So as long as the prediction grid only covers the sampled area this seems potentially fine?

The presentations at the pre-assessment workshop for Petrale, Shortspine, & Rex this afternoon all said they (we) plan to use the geostatistical standardization to account for changes in the spatial extent of the survey. Is it safe to assume that any change in catchability should be accounted for within the SS3 model and thus we need not include an era parameter in the sdmTMB model?

Oh, I had assumed people were planning on only estimating one selectivity/catchability for the survey.

Is this for the seasonality? It seems like you should not need to account for the changes in depth range at all if you are using a geostatistical index.

For example, for canary, which aren't thought to undergo any seasonal migration, I'm not sure the seasonal change in the survey will matter, so we were not going to include a break in sdmTMB OR SS3.

If there's any ontogenetic difference in depth by size, then you would probably want to model a block on selectivity. SS3 rescales selectivity to have a max of 1.0, so any change in selectivity leads to a change in the implied catchability so probably better to put a block there too. There's also the issue of the change in timing that could impact both things if there's any seasonal shifts in distribution.

The drawback of adding blocks is that you lose information about changes in abundance across that change, but you have more control over the degree of change than the status-quo approach of treating early and late as two independent time series.

Here is the triennial for canary. I see no noticeable break in length comps when the survey changed, and I see huge downside to splitting a time series in the middle of a period of rapid decline. Of course, there could be no break because it is being offset by changes in population structure, but it seems like there is a much more parsimonious explanation...

@okenk, I just looked over the 2007 Canary assessment report and associated STAR report which is cited in our handbook as the origin of the common practice of splitting the triennial survey. The full text of the STAR section on this is pasted below. The "bad residual pattern" in the fit to the non-split index doesn't seem to be shown in either document, but the sensitivity of the model to the change is not that big as shown in the figure pasted below (from Fig 93 of the report).

Whatever the residual pattern may have looked like, it probably felt more important only a few years after the triennial ended, and there were only 4 data points in the WCGBTS. I also wonder if we would have ever split the survey if we had a different set of reviewers at that STAR panel.

However, I'm not really an expert on triennial lumping or splitting. @chantelwetzel-noaa has much more experience with this and probably a different perspective.

Sensitivity to candidate base model: split triennial times series into two blocks (1980-92 and 1995-2004.

Reason: A continuing concern about potential changes in availability due to the change in survey timing.

Response: The time series was split as requested and separate _q_s were estimated (with the same selectivity).

Discussion/conclusion: The bad residual pattern in the fit to the Triennial time series was eliminated with the second segment of the series being fitted almost exactly. The Panel recommended, and the STAT agreed, that the split time-series be adopted as the base model because of the concerns about surveying timing (and the poor residual pattern if the series was not split). The Chair expressed concern about the precedent set by adopting the split in the Triennial time series (with regard to assessments of other stocks which rely on it as an abundance index).

I wasn't going to comment on this thread in order to let other discuss without my potential bias. In my experience the triennial survey generally have very noisy indices for rockfish species. In 2017 for Pacific ocean perch, I too opted to not split the time-series, given that any change in the sampling range should have little to no impact on POP given that it only extends to approximately northern California and out to ~450 meters. However, for POP, the model was unable to fit both the WCGBT and the Triennial survey. Given this, I opted to move forward with a model driven by the WCGBT survey but also continued to retain the Triennial survey as a single time-series but add added variance estimation to down-weight it in the model. I think there are reasons why the information that did not align with the signals from the other surveys (very noisy signal and sampling across a period with very poor recruitment until potentially the very end years). The reviewers on my STAR panel, correctly took the position that if we should not retain data in our model that is being poorly fit. The outcome of that STAR panel was to remove the Triennial survey all together from the model, which I think was the correct decision. There are sensitivities in the document that show a very different perspective of the population if you forced the model to fit the Triennial.

While I think the correct decision was made in the STAR panel, I did receive some negative feedback from the SSC about this decision and I think it was one of a couple of factors that lead the SSC to send the assessment to mop-up even after the STAR panel endorsed the model. In my opinion, our common decision of splitting the Triennial time-series and added extra variance within the model, essentially down-weighting it out of the model, has allowed for the appearance of using these data when in reality we are not. I would prefer a more explicit exclusion of these data if we think they are not informative or misleading for a species, but I do think this approach would increase grief during the SSC review period.

Here is an email thread from 2020 where I have removed individuals names and just left the main content. The focus was on how to treat 2004 of the Triennial data.

The question of whether the 2004 survey should be included in the Triennial time series has come up from time to time, and it would be good if we come to a common understanding and resolution. The survey that year was conducted by us rather than by the AFSC (and by different boats/captains?), after we had been conduction the NWFSC slope survey/WCGBTS for several years, and my understanding, at least, is that this may have resulted in difference in where exactly trawling occured (proximity to rocks, size of trawlable area) and resulted in bias in indices for some species relative to the rest of the time series.

This issue is not unlike the issue of the 2009 hake survey index for which the survey team had to deal with the issue of having large amounts of Humboldt Squid mixed in with the hake, leading to the development of a "rule of tentacle" to divide the two species. This led to some bias, undoubtedly (though in which direction it was not immediately clear), and, to account for that, extra uncertainty (CV) was added to that particular index in the assessment. At this point, it might be interesting to see what happens if the 2009 index were removed altogether, given the wealth of information from surrounding years (including an extra survey added in).

Options as to how to deal with the 2004 survey index are:

Remove on first principles as changes were too great to be consistent with other years
Add extra CV to account for potential bias
Leave as is
Add a parameter allowing for a time change in q for 2004, which is really what happened as the general feedback from this year of survey is that skippers that fished in 2004 were able to get closer to the rocks because they knew the local waters better than those used for past surveys. It would be worth double checking which vessels/captains ran each year of survey data collections.

Each of these approaches could be applied across the board, or one could argue there are differences of impact for flatfish vs. rockfish, for example. I'd have to be reminded for which species this index appears bias compared to the trends in our survey and the assessment as a whole.

This is a good thing to have a unified approach-- thank you for bringing this up. Regarding the 3 options, to me (1) and (2) could essentially be the same. By downweighting the index, I presume that is basically to limit its influence. Is option (2) supposed to achieve a mix between getting rid of it and keeping it? If it has any influence, then we are saying it does have some signal in the noise and need to somehow quantify it. I just want to make explicit that option (2) should avoid a big CV that nullifies its contribution.

Right. I suppose if we employed a consistent increase in CV, then that would acknowledge some bias, and would eliminate influence for those cases where there was a small difference from expected but retain some influence in some cases. I agree that option 2 creates more difficulties and is not as clean a 1 or 3.

First, I agree that we should all be implementing the same default approach. Second, I would be very reluctant to toss the 2004 year of triennial data without exploring it, although we are probably far enough away from 2004 that this wouldn't be hugely influential for most stocks. Finally, I see one additional option for the 2004 year of Triennial data, which is: The petrale model uses (2), acknowledges the differences, accepts a limited lack of fit to 2004 while estimating additional variance series. This is because the direction of change is certainly real, it's the magnitude of the change that is the problem. For me, (2) has been the default for all survey indices, essentially estimating additional process error. Although if the estimated added CV is small then I would fix this parameter to zero. I think Sablefish and Lingcod used approach (2) for all indices.

One of the issues with estimating the added variance is that it adds it to all years (if I remember right). That could mean if 2004 is way off, it will blow out the CV for all the other years to deal with that one year. It is the same failing we have in using the Francis weighting method with a year outlier.

To the point above about understanding the influence of 2004, should we do model runs making the CV in that year very large and see how much it changes things?

I actually think that the cleanest way to deal with 2004 is to estimate a time change in q (option 4). If we are picking one default that would be my standard. This is also how we've dealt with the mid-1990s survey changes in the triennial time series.

Regarding (4) are you referring to catchability within VAST or SS? I am not sure if we can do this with only one year of data.

I don't recommend basing our decision off of how much it changes the stock assessment results. Instead, I think we should be basing our choices on if we believe we are modelling the data the best we can be or if we think the information is representative of the models that we are assuming. Second, I don't think anyone is advocating for estimating additional cv's but rather specifying a larger CV than what is estimated in VAST for that year. Please correct me if I am wrong.

I was talking about how to deal with this in SS. If there is an agreeable approach to dealing with these issues while building the indices in VAST rather than in the assessment that would be better.

If that is what folks are talking about, then that sounds good. That would be fixing the CV, not estimating it in SS, and that would be fine with me, though Melissa's option (4) might be more consistent with your fair point about model specification (as long as we believe there is not something fundamentally wrong with the data in the first place-- option (1)).

I'm OK with the option 2 approach of adding a constant to the uncertainty but I think it would be pretty ad hoc and subject to a lot of scrutiny (more scrutiny than the lack of fit that it would help solve). I like the SS implementation of option 4, where Q can be modeled as changing between 2001 and 2004. If the offset or replacement parameter is estimated without a prior, all information on the index point would be lost, so we may want to come up with an informative prior. Spatial information associated with the 2004 helps inform the VAST estimates of high and low density areas and spatial autocorrelation. If we don't think the sampling was similar enough to be informative of the spatial patterns of the species to contribute to that estimation, we should throw 2004 out entirely before putting the data into VAST (as is done with 1977). We should also think about bio data. For some species (including the skates), 2001 and 2004 contain most of the biological samples and it would be strange to throw out the index without also throwing out this potentially valuable bio data. We should also think about the 1995 change in design that has caused it to get split in the past. An SS implementation of time-varying Q (option 4) could include blocks that allow a shift in both 1995 and 2004 with the same or different priors on each change. However, that still leaves the question of whether selectivity or availability changed at either point and whether similar blocks on selectivity should also be considered (at which point you're probably not far off of treating it like 2 separate indices with 2004 thrown out). Meta-analysis? Would it be crazy to use estimates of the change in Q associated with blocks starting in 1995 and 2004 for a few data rich species as a source of an informative prior used for all other species? Or to conduct a meta-analysis of all species that have had full assessments? I'm sure the changes are species specific, but this would feel better than picking a number out of a hat.

It doesn't make sense to me to assume a constant catchability in VAST and then estimate changing catchability in SS3.

I don't understand the role that catchability plays within VAST, so probably am not a good source for a logical argument. If there's a way to have a 1-time change in catchability within VAST that could be even better, but I don't see how that wouldn't be completely confounded with the year effect. I believe that Rick set up time-varying Q in SS with the intent that it could be used to model changes in standardized indices, so just assumed that this would be an appropriate use of that feature.

The Alaskan-class chartered fishing vessels used in 2004 were the Vesteraalen and the Morning Star. The Vesteraalen had done the survey twice previously, but this was the Morning Star's first time in the West Coast Triennial Survey: *ALASKA* CENTER, WEST COAST TRIENNIAL SURVEY CRUISEJOINS YEAR CRUISEJOINS 1977 393 (Commando), 421 (Pacific Raider), 423 (David Star Jordan), 500 (Tordenskjold) 1980 394 (Mary Lou), 404 (Pat San Marie) 1983 433 (Warrior II), 434 (Nordfjord) 1986 406 (Pat San Marie), 429 (Alaska) 1989 407 (Pat San Marie), 461 (Golden Fleece) 1992 432 (Alaska), 465 (Green Hope) 1995 852417 (Alaska), 852418 (Vesteraalen) 1998 921326 (Dominator), 929471 (Vesteraalen) 2001 1090096 (Sea Storm), 1090095 (Frosti) 2004 1236675, 1236676 Morning Star Vesteraalen

Those vessels were skippered by their normal Alaskan Skippers, they were not from the West Coast. The 30 minute tows, compared to the our combo survey's nominal 15 minute tows, was also kept as part of the Triennial survey design.

Here is information from 'Northwest Fisheries Science Center's West Coast Groundfish Bottom Trawl Survey: History, Design, and Description' on page 4: NWFSC Triennial Survey (2004) In 2004, the NWFSC continued the Triennial Survey extending from Point Conception to the U.S.--Canada border based on the experimental design of AFSC Triennial Survey, a period characterized by less variable transect intervals and standardized depth strata. Track lines were spaced at intervals of 10 nm (nautical miles; 18.5 km), with sampling densities for the three depth strata (55--183 m, 184--366 m, and 367--500 m) similar to those established during the 1995--2001 AFSC surveys. AFSC protocols called for stations to be located randomly along the track lines at the rate of one station per 4 nm of linear distance in the shallow stratum, and one station every 5 nm of linear distance in the two deeper strata. NWFSC allocated the same number of stations per depth stratum per transect as AFSC did in 2001, but because of improved information on bathymetry, this resulted in 84 transects and 505 potential sampling stations. Each vessel was allocated a set of alternating transect lines and worked from the southernmost transect north. Two Alaskan-class chartered fishing vessels were used for the survey, equipped with the same sampling gear as earlier AFSC Triennial Surveys. *The 2004 Triennial Survey extended from May 25 through July 23, beginning and ending somewhat earlier than the Triennial Surveys conducted from 1995--932001. Although the original intent was to continue the Triennial Survey, at a reduced interval, into the future for comparison with the newly established West Coast survey design, NWFSC has not had sufficient resources (neither funds nor staff) to repeat this survey again.

I closely followed the survey design as written down in the AFSC's documentation. However, the survey may have changed somewhat from the survey design, as I was told by folks at the Alaska center that some vessels started using previously recorded tracks on their plotters as a guide to where to fish. So previous to 2004, it may have become somewhat of a fixed station survey.

2004 is also 5 years after the very large 1999 year class, so for some species, perfect timing for new ground, good recruitment, and 30 minute tows.

The above explanation makes sense for why following the protocol in 2004 could be different from previous years which may have drifted away from it. But that suggests the problem isn't so much a shift in catchability in 2004 but a drift in catchability in previous years with a reset in 2004. The Morning Star being a new vessel issue could theoretically make a difference in a different way which could maybe be investigated by working up indices using only the Vesteraalen data from 2004.

As background material for our discussion in 2 weeks, I've attached an Excel sheet that Hastie put together during the 2019 skate STAR panel after a discussion about the lack of fit to the 2004 survey (focused on flatfish). It shows that the model expectations were increasing from 2001 to 2004 due recent good recruitment and/or reduced catches, but the increase in survey observation was larger in all cases. That suggests that 1999 recruitment isn't adequate to explain the anomaly, at least for flatfish.

Vessel-year can kind of get at the skipper effect.

Do we have any data on skippers?

I don't recall any skipper info in the AK data. The combo survey does have personnel info and I have looked at skippers for our survey. In part, I found that a senior captain became more conservative over time; not going near rocks as often. In the short term, on the vessel, there is only a downside for the vessel to get close to rocks. Those downsides are a chance of getting snagged on the rocks and extra work for net repair, both of which cost time. Time is important because since they can't target on fish, the bragging rights come from the number of tows completed. The younger captains on the same vessel are more aggressive, perhaps seeing the longer term benefit to their careers of showing the species abundance near the rocks. The captains on the AK class vessels are not on their home turf, hence the comfort of using a previously snag-free 30-minute track on the ship's plotter that was theirs or left on the plotter by a previous captain. Overall in the Triennial survey, one AK vessel's captains may be more correlated then another vessels, depending on this information legacy.

Did anyone ever do a comparison of the Triennial survey trends across species, particularly with regard to 1998-2004? I seem to recall the most severe differences in 2004 catch rates being large increases for flatfish species. (eg, the 2004 Dover value is 180-200% of 2001; and even though there was an upward trend in the triennial values from 1995-2001, and a sharp increase in the AFSC slope survey from 2000 to 2002 (though not the NWC Slope), the 2011 model doesn't fit those increases well at all) I just looked at the assessments for sablefish and shortspine, and there is very little difference in the 2001 and 2004 index values. I looked in the folder for the 2008 catchability workshop, but I didn't see anything that looked like a triennial comparison across species, that would show where the 2004 observation looked wonky, relative to the preceding trend, and where it did not. Does anyone recall ever seeing such a thing?

@kellijohnson-NOAA, thanks for posting this previous discussion on the 2004 triennial.

For the 2023 assessments we're clearly not coming something that we're sharing among species, such as an informative prior on time-varying catchability or a good estimate for the amount of increased variance to associate with 2004.

However, I think we could improve documentation on the lack of fit and impact of removing that observation.

Perhaps all of the 2023 assessments included a sensitivity analysis to estimate the impact of removing 2004 from the index entirely. That should bracket the amount of impact associated with any of the other options discussed in the thread above. I'm thinking it would be helpful to compile information such as the following which I just put together for the 2019 petrale update since the 2023 model is still in development. The output shows a really bad fit to the 2004 point and a trivial change in current depletion (0.0005.

This can totally be done after the assessment cycle is complete and might actually be easier for 1 person to do for all the stocks than to compile input from a bunch of people.

# read model with and without 2004 triennial survey
mod1 <- r4ss::SS_output("models/2019.001.001_base/", printstats = FALSE, verbose = FALSE)
mod2 <- r4ss::SS_output("models/2019.001.015_no_2004_tri/", printstats = FALSE, verbose = FALSE)

# info on survey fit
mod1$cpue %>% dplyr::filter(Yr == 2004, Fleet_name == "TriLate") %>% dplyr::select(Obs:Like)
#       Obs     Exp   Calc_Q    Eff_Q       SE  SE_input      Dev    Like
# 1 10521.2 5650.65 0.653522 0.653522 0.393267 0.0800975 0.621623 1.24925

# info without observation showing change in Q
mod2$cpue %>% dplyr::filter(Yr == 2004, Fleet_name == "TriLate") %>% dplyr::select(Obs:Like)
#       Obs     Exp   Calc_Q    Eff_Q       SE Dev Like
# 1 10521.2 4635.22 0.536088 0.536088 0.194036  NA   NA

# difference in current depletion
mod1$current_depletion
# [1] 0.3874229
mod2$current_depletion
# [1] 0.3868508
mod2$current_depletion - mod1$current_depletion
# [1] -0.0005721009

# plot comparison of index fits
SSplotComparisons(
  SSsummarize(list(mod1, mod2)),
  subplots = 13,
  indexPlotEach = TRUE,
  print = TRUE,
  plotdir = mod2$inputs$dir
)

The trouble with just removing a single year from the assessment is that the estimate of the index might be different if the data from that year were not included in the sdm itself.

I see what you are saying, but my instinct is that is unlikely to be a high leverage factor.

pfmc-assessments / indexwc