Closed iantaylor-NOAA closed 6 years ago
Finally made progress on r4ss changes required to use this option. See https://github.com/r4ss/r4ss/issues/129#issuecomment-358468468 for details. The changes don't yet work for length comps, but should cover our needs for the hake model adequately to support exploring this alternative likelihood that saves on the model tuning step.
Great. I got partway through reading the paper on Tuesday, and it seems worthwhile including. The automated data weighting makes more sense than manually, given it's now available.
I just uploaded to Google Drive model 2018.19.15_compWeight_DirMult which uses the Dirichlet-Multinomial likelihood.
Thoughts on the model The weights given to the fishery and survey age comps, are about 0.45 and 0.91, respectively, which are higher than the 0.15 and 0.44 values applied to those fleets under the status-quo McAllister-Ianelli tuning method. Increasing the weighting of the age comps didn't reduce the weight given to the survey biomass (there was little change in the "Q_extraSD_Acoustic_Survey(2)" parameter, but the relative weight of the age data compared to the survey index is a bit higher, and the weight given to the fishery ages is slightly higher relative to the survey ages. Overall the resulting model is pretty similar to model 2018.19 with the recruit devs spread out just a little bit and the early years and initial equilibrium are shifted a bit downward (since the years are only informed by the fishery ages that get slightly more weight).
Given that data-weighting is a topic of ongoing research in fisheries science, I'm not sure that there's a statistically sound way to determine whether model 2018.19.15 with the D-M likelihood is better or worse, but the advantage of having the tuning occur automatically, rather than requiring manual editing of the values at the bottom of the control file makes the new option appealing from a pragmatic standpoint, especially for purposes of things like sensitivities or profiles over the variability in time-varying selectivity where iterative tuning would make a difference and yet isn't practical.
Changes to r4ss I just committed r4ss version 1.30.0 which has plots and text output that are useful for understanding the output from the D-M likelihood when running SS_output and SS_plots. However, for the purpose built plots used in the hake-assessment, it may make no difference. See note at: https://github.com/r4ss/r4ss/issues/129#issuecomment-358468468
Changes to input files For future reference, the changes to the input files required to use this option are
# Dirichlet-Multinomial parameters controlling age-comp weights
-5 20 .5 0 99 0 2 0 0 0 0 0 0 0 # ln(EffN_mult)_1
-5 20 .5 0 99 0 2 0 0 0 0 0 0 0 # ln(EffN_mult)_2
2. comment out the variance adjustments factors at bottom of the control file.
3. In the data file, change the "CompError" column in age comp specifications from 0 0 to 1 1 and "ParmSelect" from 0 0 to 1 2, so that the table looks like
#_mintailcomp addtocomp combM+F CompressBns CompError ParmSelect minsamplesize
-1 0.001 0 0 1 1 0.001 #_fleet:1_Fishery -1 0.001 0 0 1 2 0.001 #_fleet:2_Acoustic_Survey
Thanks Ian, it sounds like we should consider switching the base case to this for the auto-tuning reasons at least.
Before making a call on that switch, let me try to get the alternative time-varying selectivity parameterization working with the current model 2018.19 and then we can evaluate the potential changes together.
Sounds like general consensus to use this approach. Before closing the issue we should check how it works in MCMC and look at the uncertainty in the weightings.
Quick comparison of results of 12-million MCMC chain indicates nothing strange in comparison with MLE values.
The median weights given to the fishery and survey age comps are 0.36 and 0.99, respectively, which is closer to the 1:3 ratio from the McAllister-Ianelli weights applied to this model (though still higher in overall scale).
Dirichlet-Multinomial (D-M) likelihood proposed by Thorson et al. (just added to https://github.com/cgrandin/hake-assessment/blob/master/doc/all.bib as "ThorsonEtAl2017Dirichlet" and uploaded to Google Drive "papers" folder) is worth considering over status-quo multinomial likelihood.
The key advantage is that data-weighting is done automatically via an estimated parameter, rather than via manual iteration as in the McAllister-Ianelli approach. Having automated data-weighting means that as you explore sensitivities and retrospectives, each model is automatically tuned rather than potentially out of balance. This is probably most important when exploring alternative setups for time-varying selectivity.
The D-M option is available in SS version 3.30, but the r4ss output needs improvement to apply it. I'm working this morning on addressing this this morning (see https://github.com/r4ss/r4ss/issues/129) (though I need to take a break from now until about 10am).