pfmc-assessments / VASTWestCoast

VAST for the NWFSC West Coast data
Other
5 stars 1 forks source link

Extreme catch events #49

Open kellijohnson-NOAA opened 2 years ago

kellijohnson-NOAA commented 2 years ago

Problem

How to model extreme catch events in spatiotemporal models that pertain to creating an index of abundance for stock assessments.

Background

@chantelwetzel-noaa brought up during the 20 April 2022 PEP team meeting that calculating indices with extreme hauls, e.g., petrale, leads to sensitivity of the results to inclusion or exclusion of the event. Explorations have been conducted using the Tweedie distribution thus far. For spiny dogfish, Thorson recommended the lognormal distributions as the best available treatment of ECE because the tail is a bit fatter (Tweedie wasn’t available at the time of the dogfish work and @okenk suggests that it’s tails may be adequately fat already). Owen proposes excluding the single tow for petrale on the grounds that it’s an outlier spawning aggregation that we don’t normally see during the survey season (fishery on spawning aggregations is in winter). The 2nd highest tow occurred 4 miles away on the same day of the year (September). The prediction from 2019 is more in line with the lower index (with outlier excluded).

Proposed solution

okenk commented 2 years ago

I think I was confusing the Tweedie distribution with something else I had seen from Sean Anderson-- it just handles zero observations. It looks like the tails should be similar to a gamma.

James-Thorson-NOAA commented 2 years ago

VAST also includes the generalized gamma distribution ... Jason Conner has agreed to work with me on a publication comparing gamma, lognormal, and generalized-gamma performance when he's back from family leave (and I hope people will be respectful of not using the feature or repurposing the code without consent). I'm happy to discuss if that's helpful, including looking for ways to partner with Jason while having it available for uses like this in 2022.

chantelwetzel-noaa commented 2 years ago

After our discussion yesterday on how to best model data with extreme events, I ran VAST for petrale sole using either the lognormal, gamma, or tweedie distributions. The runs that used either the lognormal or the gamma distribution, resulted in estimates for 2021 that were considerably less influenced by the one extreme haul, with the lognormal model resulting in the lowest point estimate for the 2021 data. I did an additional run looking at Pacific spiny dogfish which has also had extreme hauls in the past and both the gamma and tweedie models resulted in similar time-series estimates with the extreme haul having a large impact in the estimate from that year. In contrast, the lognormal appeared to greatly reduce the influence of the extreme haul. I don't know what the "correct" answer is here aside from that users should thoroughly explore their data and then run multiple VAST models for comparison.

James-Thorson-NOAA commented 2 years ago

Quoting Thorson et al. 2021:

For example, the lognormal has skewness of (where is the measurement error coefficient of variation) while the gamma has skewness of . Given that the estimated is typically above , these distributions can have substantially different skewness.

You might use the logSigmaM value to calculate the CV and hence the skewness, where higher skewness will result in lower leverage for above-expectation outliers.

On a related note, I usually don't recommend using AIC to select gamma vs. lognormal when the index scale matters (because gamma often has scale more similar to design-based even in cases when AIC selects the lognormal). However, given that you don't care about index scale (freely estimating catchability coefficient), perhaps its worth using AIC to identify which is fitting better.