Update configuration.csv for petrale to delta_lognormal

iantaylor-NOAA commented 1 year ago

What was changed/addressed

The delta_lognormal distribution produces the same trend but different scale than the delta_lognormal_mix. @kellijohnson-NOAA, this is updating the configuration.csv file to reflect the change in chosen distribution.

Additional information

Comparison below shows very similar trend but different scale and different uncertainty (where the blue line represents the delta_lognormal_mix (est, lwr, and upr values) multiplied by 4.84 which is the ratio of the mean index under the different distributions. index_comparisons_WCGBTS_11-May-2023

seananderson commented 1 year ago

This makes me wonder if the logic on the _mix() families is right on the predictions. When fitting, the linear predictor gets created and then in the observation likelihood section a second larger mean gets created based on a ratio of small to large mean (exp(log_ratio_mix) + 1.0) and a probability of that larger mean (invlogit(logit_p_mix)) and entered into the likelihood. However, the predictions are then based on the original linear predictor. This means the 2nd higher mean gets thrown away as observation error. Should the linear predictor not include both means or are the extreme events intentionally left out? I.e., should this combined mean be recorded as the new eta_i and mu_i? That would explain the lower scale above. @ericward-noaa

ericward-noaa commented 1 year ago

Two thoughts:

Yeah, I hadn't realized the mixtures might be used in assessments this cycle, so hadn't finished the prediction part that @seananderson was talking about. Happy to do that next week. The 2nd mean is definitely not included in the predictions, but should be -- and that likely is responsible for the scaling
The way the mixtures are in sdmTMB now are slightly different from the previous assessments that used mixture distributions [e.g. non-spatial ECE models]. As before we estimate the ratio of the means, and the probability of the extreme high values -- but I currently coded it up to assume the CV of the 2 components is the same. In past models, we'd allowed the mixture distribution to have separate variances for the small / large component -- and this could be changed, but I think given there's usually very little data in the extremes, it's probably fine to keep them shared. But any thoughts welcome

iantaylor-NOAA commented 1 year ago

Thanks @seananderson and @ericward-noaa for the responses and explanations. All that makes sense. In general the scale might not be a big deal as catchability has often been treated as a nuisance parameter, but interpretation of catchability for petrale sole has subject of much debate in the past (because it's been estimated above 1.0), so the review will be smoother if we don't have to explain why the parameter is very different.

iantaylor-NOAA commented 1 year ago

@okenk and @brianlangseth-NOAA, discussion in this pull request may be relevant to interpretation of catchability for the Canary triennial index given the current configuration.

ericward-noaa commented 1 year ago

Hey @iantaylor-NOAA @kellijohnson-NOAA , I created a mix-predict branch over on the sdmTMB repo, and made the changes to the predictions from the mixture models so they line up better. I have a simple example over there in the comments, but might be worth running your data above through that. Or send me the model / dataframe, and I'll do it

@seananderson made a good point that the residuals from these models still might not look good -- because they're simulated as a mixture (so bimodal). Not really anything to do about that now, but a heads up if this is used further

iantaylor-NOAA commented 1 year ago

Thanks @ericward-noaa for working on this. I think the standard lognormal is doing everything we need for the ongoing petrale sole assessment so we don't NEED results of the revised mix distribution model. Even if the scale ends up the same, the larger uncertainty intervals associated with the mixture model may lead to less good fits to this index which probably would not be an improvement. If there were a way to include the correlation among index observations in the assessment model (so the model knows that the scale is more uncertainty with wider intervals but the trend might be similarly precise), that would be less of an issue.

Having said all that, I'm interested in seeing the results of your work, so if you're willing to run some models, I think this is what you would want to do in a fork of {indexwc}

add a row to https://github.com/kellijohnson-NOAA/indexwc/blob/main/data-raw/configuration.csv to include the petrale with sdmTMB::delta_lognormal_mix() instead of sdmTMB::delta_lognormal()
consider also adding a row for canary rockfish to use sdmTMB::delta_lognormal() instead of the status-quo mix.
either delete rows unrelated to petrale and canary in configuration.csv or subset via a change to this line: https://github.com/kellijohnson-NOAA/indexwc/blob/main/data-raw/configuration.R#LL27C9-L27C20)
run the code in https://github.com/kellijohnson-NOAA/indexwc/blob/main/data-raw/configuration.R to run models

pfmc-assessments / indexwc

Update configuration.csv for petrale to delta_lognormal #16

What was changed/addressed

Additional information