nmfs-ost / ss3-source-code

The source code for Stock Synthesis (SS3).
https://nmfs-ost.github.io/ss3-website/
Creative Commons Zero v1.0 Universal
36 stars 16 forks source link

Fix reference to par files in est workflow to ss3.par #564

Closed e-perl-NOAA closed 6 months ago

e-perl-NOAA commented 6 months ago

Concisely describe what has been changed/addressed in the pull request.

Quick to to workflow to update par name to new format (ss3.par)

What tests have been done?

Where are the relevant files?

What tests/review still need to be done?

Is there an input change for users to Stock Synthesis?

None.

Additional information (optional).

e-perl-NOAA commented 6 months ago

@iantaylor-NOAA I noticed this test failing when working on my admb-docker-attempt2 branch after changing the run-ss3-with-est workflow to use ss3.par. It's failing because one of the models (Kelp Greenling 2015) has some values that have changed...I'm not quite sure why those values would have changed???? Do you have any ideas?

e-perl-NOAA commented 6 months ago

The commented out push rules is to try the run-ss3-with-est workflow because I noticed it failing on a different branch I was working on and wanted to see if it would fail here, which unfortunately it did because of the Kelp Greenling model. It will be put back in before it's ready to be merged.

iantaylor-NOAA commented 6 months ago

@e-perl-NOAA thanks for flagging the failed test. I just looked at the changes to the Kelp Greenling test model in the test_failed.csv file in the artifact from the failed test.

None of these changes seem like a big deal. I think that is just due to imperfect rules for pass/fail. For instance, the Catch_like went down from 0.0127685 to 0.0003390 which is a large percent change but both values are tiny. The gradient also improved from small to tiny. I think the most notable change is the decrease in scale of the some of the values related to population scale, such as R0 and ForeCatch_2024, which went down from 81.1 to 76.6. That's still only about a 5% change so I wouldn't worry about it at this stage since all the other models passed.

The unique characteristics of the kelp greenling model, as listed in the README for ss3-test-models are "Maturity option 6 (read length-maturity), Mirrored selectivity, input variance adjustment". I'm not aware of any of the changes between 3.30.22.0 and 3.30.22.1 that would have impacted any of those things, but @Rick-Methot-NOAA might have additional thoughts.

Rick-Methot-NOAA commented 6 months ago

I will look at kelp greenling. That change in forecatch would raise eyebrows

e-perl-NOAA commented 6 months ago

@iantaylor-NOAA thanks for taking a look at the artifact. I looked at it when it first failed on the other branch I was working on and most of the changes (on that branch and this one) didn't seem that significant but I just wanted to be sure. There weren't any flags/failed tests/significant changes in this workflow previously as the v.3.30.22.1 was developed so it is a little strange that it is failing/there are significant now.

Rick-Methot-NOAA commented 6 months ago

for kelp greenling: run locally with 3.30.22 and 3.30.22.1 .22 gets logL 1305.49 in more iterations than .22.1 which gets a better logL of 1305.29. Taking the ss3.par from the new model and pasting into ss.par for .22 maintains identical result as the new model. So, it seems that the old model was getting offtrack and converged in the wrong spot. It would be informative to run jitter on kelp greenling with 3.30.22.00 and see if it sometimes finds the better answer found by 3.30.22.1 Rick

e-perl-NOAA commented 6 months ago

jitter_results.xlsx I ran 10 jitters of the v3.30.22 models with the v3.30.22 exe and jitter3 gets a LogL of 1309.28 and the ForeCatch for that jitter is the same as the one in v3.30.22.1 also.

Rick-Methot-NOAA commented 6 months ago

Thanks. The deltas between jitters are remarkably unstable. This could be because jitter is a very crude tool that can misbehave if some parameters have unrealistically wide bounds, or could mean that this application has some fundamental indeterminacy.

e-perl-NOAA commented 6 months ago

Okay so I just tried updating it with the re-run Kelp Greenling files and it's still failing (values still different but they are what the values were before). It seems like that model will create two sets of values and it flip flops between the two....

Rick-Methot-NOAA commented 6 months ago

:( That's definitely a problem for a test model! Does this have unique features beyond the maturity function used?

e-perl-NOAA commented 6 months ago

I'm not sure right now, I'll have to look once I'm out of my meetings this afternoon. However, it also is the only model that has changes to the forecast file when all the models are re-run (https://github.com/nmfs-ost/ss3-test-models/pull/59/files#). Idk if that might give us a clue.

iantaylor-NOAA commented 6 months ago

Other features listed in the test-models README are input variance adjustments, which are found in lots of other models, and mirrored selectivity, but a few other models have that, including at least BigSkate_2019 and tagging_mirrored_sel. Seems like we should remove it from the set and maybe add maturity option 6 to a different model or add a more stable model to include it.

e-perl-NOAA commented 6 months ago

@Rick-Methot-NOAA what do you think about Ian's suggestion? Maybe we could have a Simple_with_Length_Maturity test model?

Rick-Methot-NOAA commented 6 months ago

Length maturity is not the kind of feature that will interact with other model features at all, so does not provide a strong test.

Let's just drop greenling and once again try to get me to finish the strong test model that has areas and seasons.

e-perl-NOAA commented 6 months ago

Okay, sounds good!