Should we be using 'forecast' year biomass values or model output values for the next year

cgrandin commented 10 months ago

I remember having this discussion with Jaclyn Cleary about Herring, and she wanted to use the 'forecast' values for the 'next year' (final year in the model) because those values took into account the estimated recruitment and age structure (I believe) estimates for that year, whereas the main model run was just an estimate based on previous years.

This question comes from #1111 in which we could use either value for the 2024 biomass (depletion).

I'm really just curious @aaronmberger-nwfsc and @kellijohnson-NOAA why these are different for this model. It would be good to understand this as it keeps cropping up with every assessment.

cgrandin commented 10 months ago

This is perhaps a non-issue after I found out that I was dividing the biomass by the initial biomass twice instead of once. Once fixed, the median is almost exactly the same for both of these, but the uncertainty is larger on the model estimate from the main model run when compared to the forecasts.

Depletion: main run 2024 5% = 0.3442664 forecast 2024 5% = 0.3956313

main run 2024 median = 0.7636227 forecast 2024 median = 0.7636235

main run 2024 95% = 1.7681535 forecast 2024 95% = 1.5289015

kellijohnson-NOAA commented 10 months ago

I don't know enough about how we do the forecasts to give a good answer. The differences that you are seeing here, are they just because we are running the model again, i.e., different random starts for the chains?

aaronmberger-nwfsc commented 10 months ago

I just looked at the depletion estimate (...$mcmccalcs$dmed) for the base model and many forecast models and the 2024 median is = 0.7636227 for all of them. Where is the forecast 2024 median of 0.7636235 coming from?

cgrandin commented 10 months ago

The depletion values (median = 0.7636235) can be found in R in these objects:

base_model$forecasts$`2024`$`03-225000`$depl
base_model$forecasts$`2025`$`12-545000`$depl
etc..

Those quantiles are calculated from columns in the posteriors.sso files found in each forecasts directory, which are created by running -mceval on the base model MCMC output (inputs and psv file), which has been copied into each directory like this one:

/srv/hake/models/2024/01-version/01-base-models/01-base/forecasts/forecast-year-2025/12-545000

For 2024, the values are the same for ALL forecasts. The forecast.ss files are modified prior to running -mceval. Like this (for the second year of forecasts at the TAC level:

2 #_InputBasis
 #_Year Seas Fleet Catch_or_F
   2024    1     1     545000
   2025    1     1     545000
-9999 0 0 0
#
999 # verify end of input

No matter what years and catch are added to the forecast.ss file, the depletion for 2024 is as shown in the image of the command line above, which is different than the mcmccalcs$dmed, dlower, dupper values which are calculated in the same way (posteriors.sso quantiles). The other years will change though depending on the forecasting.

cgrandin commented 10 months ago

@kellijohnson-NOAA I don't believe so because we are only running the -mceval to do the forecasting.

aaronmberger-nwfsc commented 10 months ago

Could this just be a calculation thing such as the difference between the following two methods: Method 1: taking the posterior median SSB (of all 800 samples) and dividing by the posterior median B0 (of all 800 samples); versus Method 2: calculating depletion for each MCCM sample first [MCMC sample(x) SSB / MCMC sample(x) B0] and then taking the median of the 8000 calculated depletions

Does that make sense? I've done this for other things and they come out real close but not exact.

cgrandin commented 10 months ago

No it isn't, but I'd like to find out what's going on at some point. I will set the milestone to next year as there's no time for 2024 to look at this.

pacific-hake / hake-assessment

Should we be using 'forecast' year biomass values or model output values for the next year #1113