Explore switch to Dirichlet-Multinomial likelihood for comps

iantaylor-NOAA commented 6 years ago

Dirichlet-Multinomial (D-M) likelihood proposed by Thorson et al. (just added to https://github.com/cgrandin/hake-assessment/blob/master/doc/all.bib as "ThorsonEtAl2017Dirichlet" and uploaded to Google Drive "papers" folder) is worth considering over status-quo multinomial likelihood.

The key advantage is that data-weighting is done automatically via an estimated parameter, rather than via manual iteration as in the McAllister-Ianelli approach. Having automated data-weighting means that as you explore sensitivities and retrospectives, each model is automatically tuned rather than potentially out of balance. This is probably most important when exploring alternative setups for time-varying selectivity.

The D-M option is available in SS version 3.30, but the r4ss output needs improvement to apply it. I'm working this morning on addressing this this morning (see https://github.com/r4ss/r4ss/issues/129) (though I need to take a break from now until about 10am).

iantaylor-NOAA commented 6 years ago

Finally made progress on r4ss changes required to use this option. See https://github.com/r4ss/r4ss/issues/129#issuecomment-358468468 for details. The changes don't yet work for length comps, but should cover our needs for the hake model adequately to support exploring this alternative likelihood that saves on the model tuning step.

andrew-edwards commented 6 years ago

Great. I got partway through reading the paper on Tuesday, and it seems worthwhile including. The automated data weighting makes more sense than manually, given it's now available.

iantaylor-NOAA commented 6 years ago

I just uploaded to Google Drive model 2018.19.15_compWeight_DirMult which uses the Dirichlet-Multinomial likelihood.

Thoughts on the model The weights given to the fishery and survey age comps, are about 0.45 and 0.91, respectively, which are higher than the 0.15 and 0.44 values applied to those fleets under the status-quo McAllister-Ianelli tuning method. Increasing the weighting of the age comps didn't reduce the weight given to the survey biomass (there was little change in the "Q_extraSD_Acoustic_Survey(2)" parameter, but the relative weight of the age data compared to the survey index is a bit higher, and the weight given to the fishery ages is slightly higher relative to the survey ages. Overall the resulting model is pretty similar to model 2018.19 with the recruit devs spread out just a little bit and the early years and initial equilibrium are shifted a bit downward (since the years are only informed by the fishery ages that get slightly more weight).

Given that data-weighting is a topic of ongoing research in fisheries science, I'm not sure that there's a statistically sound way to determine whether model 2018.19.15 with the D-M likelihood is better or worse, but the advantage of having the tuning occur automatically, rather than requiring manual editing of the values at the bottom of the control file makes the new option appealing from a pragmatic standpoint, especially for purposes of things like sensitivities or profiles over the variability in time-varying selectivity where iterative tuning would make a difference and yet isn't practical.

Changes to r4ss I just committed r4ss version 1.30.0 which has plots and text output that are useful for understanding the output from the D-M likelihood when running SS_output and SS_plots. However, for the purpose built plots used in the hake-assessment, it may make no difference. See note at: https://github.com/r4ss/r4ss/issues/129#issuecomment-358468468

Changes to input files For future reference, the changes to the input files required to use this option are

change control file to add 2 parameter lines after selectivity pars but before "timevary selex parameters", where the parameter lines look like


# Dirichlet-Multinomial parameters controlling age-comp weights
        -5            20            .5             0            99             0          2          0          0          0          0          0          0          0  #  ln(EffN_mult)_1
        -5            20            .5             0            99             0          2          0          0          0          0          0          0          0  #  ln(EffN_mult)_2

2. comment out the variance adjustments factors at bottom of the control file. 
3. In the data file, change the "CompError" column in age comp specifications from 0 0 to 1 1 and "ParmSelect" from 0 0 to 1 2, so that the table looks like
#_mintailcomp addtocomp combM+F CompressBns CompError ParmSelect minsamplesize

-1 0.001 0 0 1 1 0.001 #_fleet:1_Fishery -1 0.001 0 0 1 2 0.001 #_fleet:2_Acoustic_Survey

cgrandin commented 6 years ago

Thanks Ian, it sounds like we should consider switching the base case to this for the auto-tuning reasons at least.

iantaylor-NOAA commented 6 years ago

Before making a call on that switch, let me try to get the alternative time-varying selectivity parameterization working with the current model 2018.19 and then we can evaluate the potential changes together.

iantaylor-NOAA commented 6 years ago

Sounds like general consensus to use this approach. Before closing the issue we should check how it works in MCMC and look at the uncertainty in the weightings.

iantaylor-NOAA commented 6 years ago

Quick comparison of results of 12-million MCMC chain indicates nothing strange in comparison with MLE values.

The median weights given to the fishery and survey age comps are 0.36 and 0.99, respectively, which is closer to the 1:3 ratio from the McAllister-Ianelli weights applied to this model (though still higher in overall scale).

pacific-hake / hake-assessment

Explore switch to Dirichlet-Multinomial likelihood for comps #315