Diagnostics aren't as good as last year for current base model (incl. age-1).

andrew-edwards commented 2 years ago

Current base model Appendix A gives this:

Aaron said the diagnostics looked good when he manually looked at them. Code to make that figures is

plot_mcmc_param_stats(base.model)

@aaron - did you add other options?

I just had a thought that it may be something to do with the age-1 index now being included, so will take a look. Could be why adnuts-diagnostics document isn't building now.

andrew-edwards commented 2 years ago

That Geweke behaviour occurs for the first bridging model ("Update all fishery catch and comps") (but not when I check last year's base model using plot_mcmc_param_stats(), ruling out some issue with my setup; I updated r4ss okay though). Is that to do with the comments related to e9aa994 ? (Though wouldn't explain the behaviour persisting through to new base model though).

Or now that we have the extra forecast year, maybe some setting somewhere (!) also needs to be changed to account for that.

aaronmberger-nwfsc commented 2 years ago

I used r4ss commands: my_mcmc <-SSgetMCMC(dir ="mymodel", verbose = TRUE, writecsv = TRUE, csv1 = "keyposteriors.csv", csv2 = "nuisanceposteriors.csv", keystrings = c("SSB_2019","Bratio_2019", "SR_LN(R0)", "SR_BH_steep", "RecrDev_2014","EffN_mult"), nuisancestrings = c("Objectivefunction", "SSB", "Bratio_", "InitAge","RecrDev")) pdf("...\SummaryDiagnostics.pdf") mcmc.nuisance(dir="...\models\",run="mymodel\",printstats=TRUE) dev.off()

plot_mcmc_param_stats says it is a recoded version of the mcmc.nuisance function used above.

and got using the base model output on a run I did on my remote desktop (7cores):

And the effN seems to match (visually) with that from the Shiny app within adnuts code:

One thing is to look at is if you run that r4ss code above and look at the table printed to the screen you can see the actual values that go into those plots for each parameter.

andrew-edwards commented 2 years ago

Thanks. I thought it might have been a different run of the base model, which makes it curious. Will do what you said after lunch.

andrew-edwards commented 2 years ago

@aaronmberger-nwfsc Could you maybe just try plot_mcmc_param_stats() with the model you ran (I don't have all the .sso etc. files downloaded, so can't run your code)? Then we'd know if the difference is between the model runs (your computer v AWS) or the code.

cgrandin commented 2 years ago

@andrew-edwards plot_mcmc_param_stats() takes a model as an argument, you don't need the output files, just have the RDS loaded into a variable like is done in model-setup.R

andrew-edwards commented 2 years ago

Yes, that works fine. I'm talking about Aaron's code above, which directly calls the .sso files. Expect that the difference is with the model runs (not the different analyses in the code), but this will rule out the different analyses fairly easily.

cgrandin commented 2 years ago

I think I’m just saying that code is the same as what I’ve done, it’s just in the hake assessment repo it’s done in several steps, i.e create RDS files from output, then run the stats and create the plot. But if you want to run the code from scratch you can download the sso files and other files using the s3_download() function.

andrew-edwards commented 2 years ago

Agreed, I just don't think that's the issue and so not worth trying (yet). Here's the bridging figures for the first bridging run: So the spurious behaviour is happening straight away. Presumably we should try and figure this out first (may also end up helping the retro problem).

andrew-edwards commented 2 years ago

[Yes, I tweaked my code to make a bridge-diagnostics.pdf for the diagnostics of bridging models. Just don't tell Kelli ;) ]

andrew-edwards commented 2 years ago

Summary:

Seems like we're getting different diagnostics for the model run on AWS and the model run by Aaron.
Aaron did the bridging models on his computer and (I'm pretty sure) checked the diagnostics, and they looked fine all the way through to the new base model.
But diagnostics on bridge models and new base (and all sens models) don't look so good, based on the AWS runs.
Number 1 could also be due to plot_mcmc_param_stats() including more than it should (i.e. compared to Aaron's code above).

I've done code to make full diagnostics pdf files for the bridging models and then for the retro models (and the original code does all the sens models). All automated, so we can quickly check them, which should hopefully avoid the current issue.

andrew-edwards commented 2 years ago

I started adapting Aaron's code in the above commit, but don't think I downloaded the right .sso files (which is what the .rds stuff is trying to avoid). @aaronmberger-nwfsc tomorrow if you can try plot_mcmc_param_stats() on your locally-run base model, then that will tell us if it is number 1 or 4 in the above list.

aaronmberger-nwfsc commented 2 years ago

Yes, I'll run plot_mcmc_param_stats() today. I did download the hake.RData file and ran the launch_shinyadmb(nuts.updated) from the base model on the drive and manually did a histogram of the effN (below), which seems to indicate there is an issue in our plotting code. The above task will help further diagnose.

andrew-edwards commented 2 years ago

Here's the plot for Chris's 2021.00.04_base_2022_code, which is last year's base model using this year's setup. So looks different to plotting last year's base model from last year's runs, implying it's number 1 in my list above (to do with AWS run, or the automation) rather than 4 (the plotting function).

Similar looking results with the 12 CPU run that Chris just did.

andrew-edwards commented 2 years ago

We could run the base model using the automated functions on a local machine (like last year), to tease out whether it's AWS v local machine, or the functions.

cgrandin commented 2 years ago

If you want to compare machine output, you need to compile optimized SS at commit a0f982d9258e626c63d69ba3366e99b9744d0cc4

https://github.com/nmfs-stock-synthesis/stock-synthesis/tree/a0f982d9258e626c63d69ba3366e99b9744d0cc4

aaronmberger-nwfsc commented 2 years ago

And remember we still have a difference between the effN in the upper right of these plots and the shinyapp for the same exact model run on aws.

cgrandin commented 2 years ago

@aaronmberger-nwfsc that's what makes me think it's just plotting code

andrew-edwards commented 2 years ago

But if it's just plotting code, why would it work okay now for last year's base model (results as run last year):

plot_mcmc_param_stats(last.yr.base.model)

which matches last year's Figure A.4? And doesn't match the re-run of last year's model (2021.00.04_base_2022_code shown earlier).

andrew-edwards commented 2 years ago

@aaronmberger-nwfsc - what was your adapt_delta on your locally-run model? Chris used 0.95 for all (yes, Chris?). Chris - did you use 0.95 or 0.90 for the re-run of last year's? (I'll add it to the Google spreadsheet).

cgrandin commented 2 years ago

Yes, 0.95 for all models including the re-run of last year's base model.

andrew-edwards commented 2 years ago

Aha - but last year's had 0.90 yes?

cgrandin commented 2 years ago

Yes it did

cgrandin commented 2 years ago

Here is last year's using 0.95 and this year's model code and everything:

andrew-edwards commented 2 years ago

Yep, same as my earlier plot (36 mins ago).

cgrandin commented 2 years ago

Here is this year's base run at 0.95 with 12 CPUs only (instead of 16 which is what all the rest are)

andrew-edwards commented 2 years ago

Do you want to run that [last year's base model again] but with adapt_delta = 0.9?

cgrandin commented 2 years ago

And finally, this year's with 16 CPUs and 0.95 (what we want in the document). The number of CPUs doesn't seem to matter.

andrew-edwards commented 2 years ago

Yep, this matches the figure at the top of this Issue. So think we need to try adapt_delta = 0.90 again.

cgrandin commented 2 years ago

OK I can run that now

aaronmberger-nwfsc commented 2 years ago

I'm still checking, but I think this has something to do with altering the type of output we want in our report files. For example, we now output 124 more derived outputs relative to dynamic B0 and dynamic recruitment. As well as few refinements with selectivity - although that just got rid of some parameters that were fixed so shouldn't have been reporting them. If the dynamic B0 stuff is included then I can see that being the issue we are seeing. I'm going to dive into the details more in plot_mcmc_param_stats() to find out. At the end of the day, we really only need to be reporting the performance of active parameters in these diagnostics. I don't see a reason to report derived quantities, since those are a function of the active parameters.

@kellijohnson-NOAA thoughts? - as you helped with the report file adjustments at the bottom of the control file...

andrew-edwards commented 2 years ago

Thanks. And from Slack on 13th Jan Chris said: "I started running all the models in parallel last night (late) so they are still going. I forgot to set adapt_delta = 0.9 for the bridge models and sensitivities so they are running with adapt_delta = 0.95." Which explains why the very first bridging model shows the spurious behaviour.

aaronmberger-nwfsc commented 2 years ago

So it may not be an issue per se, just that we are plotting things we don't need to... still looking...

And this change in report file happened exactly when we are seeing the differences occur

andrew-edwards commented 2 years ago

@aaronmberger-nwfsc - yes, that's another change since last year. Chris's run will see if it's adapt_delta or not (if it is we'd have to re-run stuff, boo.). And your investigations will figure out if we're just including too many things in the plotting - the actual plotting functions seem to work fine.... We're also doing the extra forecasting year, but I don't think that would necessarily be it.

cgrandin commented 2 years ago

@aaronmberger-nwfsc I looked more closely at the code you wrote above to generate the initial plot that's different than what we have in the document:

my_mcmc <-SSgetMCMC(dir ="models/2022.01.10_base/mcmc/sso", verbose = TRUE, writecsv = TRUE, csv1 = "keyposteriors.csv", csv2 = "nuisanceposteriors.csv", keystrings = c("SSB_2019","Bratio_2019", "SR_LN(R0)", "SR_BH_steep", "RecrDev_2014","EffN_mult"), nuisancestrings = c("Objective_function", "SSB_", "Bratio_", "InitAge","RecrDev"))
pdf("...\SummaryDiagnostics.pdf")
mcmc.nuisance(dir="...\models\",run="mymodel\",printstats=TRUE)
dev.off()

You create an object called my_mcmc which is then never used anywhere. So the mcmc.nuisance() function is the only thing run here to produce the plot.

I ran the same thing on the current base model: mcmc.nuisance("models", "2022.01.10_base/mcmc/sso", printstats = TRUE) and get this:

So they match perfectly.

cgrandin commented 2 years ago

So you need to run build_rds("your-model-run-dir") and then

model <- load_models("your-model-run-dir")
plot_mcmc_param_stats(model)

andrew-edwards commented 2 years ago

@cgrandin - no point in that I think, as I think we've ruled out differences in plotting code. Your previous post just confirmed it.

andrew-edwards commented 2 years ago

@aaronmberger-nwfsc plot_mcmc_param_stats uses parm_nm <- model$post_names. Here are the differences in post_names between last year's base model and this year's:

> length(base.model$post_names)
> [1] 251
> length(last.yr.base.model$post_names)
> [1] 243
> setdiff(base.model$post_names, last.yr.base.model$post_names)
 [1] "NatM_uniform_Fem_GP_1"            "Main_RecrDev_2019"               
 [3] "Main_RecrDev_2020"                "Late_RecrDev_2021"               
 [5] "ForeRecr_2024"                    "ForeRecr_2025"                   
 [7] "Q_extraSD_Age1_Survey(3)"         "AgeSel_P3_Fishery(1)_DEVadd_2021"
 [9] "AgeSel_P4_Fishery(1)_DEVadd_2021" "AgeSel_P5_Fishery(1)_DEVadd_2021"
[11] "AgeSel_P6_Fishery(1)_DEVadd_2021" "AgeSel_P7_Fishery(1)_DEVadd_2021"

You mentioned 124 extra derived outputs, but they don't seem to be included here.

andrew-edwards commented 2 years ago

Ooh, but 10 of extra ones do get added on in Chris's re-run of last year's base model:

> setdiff(base.model$post_names, last.yr.base.this.yr.code$post_names)
 [1] "Main_RecrDev_2019"                "Main_RecrDev_2020"               
 [3] "Late_RecrDev_2021"                "ForeRecr_2025"                   
 [5] "Q_extraSD_Age1_Survey(3)"         "AgeSel_P3_Fishery(1)_DEVadd_2021"
 [7] "AgeSel_P4_Fishery(1)_DEVadd_2021" "AgeSel_P5_Fishery(1)_DEVadd_2021"
 [9] "AgeSel_P6_Fishery(1)_DEVadd_2021" "AgeSel_P7_Fishery(1)_DEVadd_2021"

cgrandin commented 2 years ago

I think it's just because we used 0.9 last year and 0.95 this year. I'm running this year's base at 0.9 to find out for sure.

cgrandin commented 2 years ago

@andrew-edwards some of those are the same as last year, just incremented year by one. i.e replace the 2021's with 2020 and they will be in there last year

aaronmberger-nwfsc commented 2 years ago

0.95 this year will result in less divergences than the same model used with 0.90 so the effective N would increase, but we are seeing decreases in effective N, so I don't think it's the adapt_delta. Maybe I'll be proven wrong...

cgrandin commented 2 years ago

We should know after dinner today :)

andrew-edwards commented 2 years ago

Think I should have done:

> setdiff(last.yr.base.model$post_names,` last.yr.base.this.yr.code$post_names)  # in last year's base, but not in this year's re-run of the base
[1] "NatM_p_1_Fem_GP_1"
> setdiff(last.yr.base.this.yr.code$post_names, last.yr.base.model$post_names)   # vice versa
[1] "NatM_uniform_Fem_GP_1" "ForeRecr_2024"

But taking those out in a hack of plot_mcmc_param_stats() doesn't fix the plots.

cgrandin commented 2 years ago

Oh, I didn't realize NatM_p_1_Fem_GP_1 changed to NatM_uniform_Fem_GP_1. Thanks. That will probably cause some issues with natural mortality plots

cgrandin commented 2 years ago

This is this year's model run with adapt_delta = 0.9

cgrandin commented 2 years ago

So the only thing left is that something changed in the SS code

aaronmberger-nwfsc commented 2 years ago

Interesting: I used our aws base model posteriors.sso and derived_posteriors.sso and compared those to the same files from the base model I ran locally (these are the two versions that give very different diagnostic results) and I got nearly identical median results (identical for all practical purposes given they were different number of chains).

Why would 'identical' results give such different diagnostics???

cgrandin commented 2 years ago

I think maybe we should start by comparing last year's model rds file and last year's model run with this year's code since they are different and should be identical.

I'm debugging the plotting function with those in an attempt to understand

cgrandin commented 2 years ago

But that is interesting, and doesn't make immediate sense!

cgrandin commented 2 years ago

This is the effN's from lasy year's model run with this year's code:

  [1] 3699 1348 8000 6552 8000 8000 8000 8000 8000 8000 8000 8000 8000 8000 8000 8000 8000 7354 7128 7777 7272 7385 6464 6149 3593
 [26] 4857 6778 1907 2819 3315 1697 3959 1822 3678 1421 4100 3269 1199 4519 3183 4088 1106 4519 3963 1193 1783 4186 1152 2538 4520
 [51] 1144 1131 1532 1304 1717 1266  976 3507 1112 4616 1032 3889 1068 1131 4574 1085 1906 1200 3178 1608 3564 1622 4176 2483 3887
 [76] 6875 8000 8000 8000 8000 8000 8000 7096 6484 6441 7373 4364 3530 8000 8000 7955 8000 8000 7678 8000 8000 8000 8000 8000 8000
[101] 8000 8000 8000 8000 8000 8000 8000 8000 8000 8000 8000 8000 8000 8000 7444 8000 8000 8000 8000 8000 8000 6949 8000 8000 8000
[126] 8000 8000 8000 8000 8000 8000 8000 7873 8000 8000 8000 8000 8000 8000 7953 8000 8000 8000 8000 7267 7431 8000 7078 8000 7184
[151] 8000 8000 7696 8000 8000 8000 8000 8000 8000 8000 8000 8000 7182 7901 6970 8000 8000 6825 8000 8000 8000 8000 8000 8000 8000
[176] 7523 6557 6611 7231 7555 6993 8000 8000 6562 8000 8000 8000 8000 8000 8000 8000 8000 7288 7526 6427 7492 7340 6134 7551 8000
[201] 8000 8000 8000 6900 8000 8000 6448 6770 6764 6481 7116 7896 8000 8000 8000 8000 7696 8000 8000 8000 8000 8000 8000 6858 7975
[226] 7694 7772 7410 8000 7979 8000 8000 8000 8000 8000 8000 7578 7036 7186 6862 7817 7888 6303 8000

And last year's model with last year's code:

  [1] 4571 3534 2124 8250 8250 8250 7555 8250 5939 5022 8250 7258 8034 8250 8250 7040 5349 5263 6796 6218 5442 3339 3761 2356 2280
 [26] 2350 3766 4632 3582 2586 4080 3786 4568 3000 3636 4514 5351 3589 3297 3636 4043 3403 4653 3930 3614 4170 3614 3621 3849 3823
 [51] 3568 3957 4492 4353 5345 4815 3893 4074 4300 4495 3897 2279 3965 4341 4374 3989 5499 4292 5175 4650 3695 4322 4162 4869 6535
 [76] 5101 8250 8250 8127 8250 7598 5857 7158 8250 7973 8250 8250 8250 7421 8250 8250 7166 5353 7793 4582 8250 8250 7314 7173 5117
[101] 8250 7364 6470 8250 8250 8250 6007 8250 5234 7753 8250 7594 7498 6742 7669 8250 8250 6757 8149 8250 7510 7375 8250 8250 8250
[126] 6620 6947 7178 6760 6425 6316 7453 7508 6526 8250 7040 7458 8250 5646 7438 8250 8050 7148 6493 8250 6412 6963 6782 5210 8250
[151] 7896 8250 6828 6398 8250 8250 6987 8250 6742 8250 8250 8250 6888 8250 8250 5734 4333 8046 8250 7699 7245 8250 7777 8250 8250
[176] 7601 5149 5253 5265 6973 8046 5969 6582 8250 7112 8250 8250 6702 8250 8250 8250 8250 6883 8250 8250 6965 4001 5820 8250 7760
[201] 6323 8250 6857 8250 8250 7751 6526 4959 7334 6881 8039 5753 7033 8250 7172 6500 8250 8250 6784 8250 8250 8240 6110 8250 7540
[226] 8250 8250 4526 8250 7927 8250 8250 6440 8250 8250 8250 7504 8250 7506 8250 8050 8250 6717

Almost every 8000 this year was 8250 last year

pacific-hake / hake-assessment

Diagnostics aren't as good as last year for current base model (incl. age-1). #878