nmfs-ost / ss3-source-code

The source code for Stock Synthesis (SS3).
https://nmfs-ost.github.io/ss3-website/
Creative Commons Zero v1.0 Universal
37 stars 17 forks source link

[Bug]: string error in bootstrap data file causes intermittent model exit #534

Closed iantaylor-NOAA closed 10 months ago

iantaylor-NOAA commented 10 months ago

Describe the bug

I've recently discovered multiple models (including multiple variations of simple_small in r4ss and the 2021 spiny dogfish assessment) exiting the run early when using 3.30.22. The echoinput file ends with Begin writing bootstrap data file(s) instead of continuing on to Write starter.ss_new file.

I don't think this is related to the bootstrap data itself for two reasons:

  1. The most recent case exited just prior to writing 1 #_spawn_month in data_boot_001.ss (see screenshot below), as opposed to exiting when while writing randomly generated data. There's nothing in between these two lines where it exited that could explain the exit: https://github.com/nmfs-ost/ss3-source-code/blob/797760ff1ae10554a4f7729b643ab275125981c0/SS_write_ssnew.tpl#L81-L82
  2. Running the model again with the random seed from the model with incomplete data file produced a complete data_boot_001.ss

To Reproduce

Running the attached files with the 3.30.22 executable on Windows sometimes reproduces the error and sometimes doesn't. simple_small_biasadjQ_varySE.zip

I ran using the command ss3 -nohess -stopph 0 again and again. The first 5 quit after

Begin writing *.ss_new output files ...

and the 6th time it completed:

Begin writing *.ss_new output files ... Finished writing *.ss_new output files

!!  Run has completed  !!
!!  See warning.sso for 2 notes

Finished running model 'ss3' after 0.68 s.

Expected behavior

Finish writing bootstrap data files and remaining .ss_new files.

Screenshots

Screenshot shows matching random seeds in the completed and incomplete bootstrap data files. image

Which OS are you seeing the problem on?

Windows

Which version of SS3 are you seeing the problem on?

3.30.22

Additional Context

No response

Rick-Methot-NOAA commented 10 months ago

Totally bizarre. I get 5-6 good runs, then a failure, then repeat that cycle.

Changed random seed to 1677620645 and now get complete boot data and failure in dynamic Bzero.

Change seed again and get stop in another location.

Hypothesis is that we have an uninitialized variable

Rick-Methot-NOAA commented 10 months ago

@e-gugliotti-NOAA Can you check the compile report that has full warnings to see if it says anything about uninitialized variables?

e-perl-NOAA commented 10 months ago

yes, I can check

e-perl-NOAA commented 10 months ago

I didn't see anything there or when running the derivative checker.

Rick-Methot-NOAA commented 10 months ago

@e-gugliotti-NOAA please send me the warnings file so I can keep digging into this

e-perl-NOAA commented 10 months ago

Here is a zip file with all the output files from a run that stopped ss3_issue.zip

Rick-Methot-NOAA commented 10 months ago

I see that when I run locally. I am looking for compile messages like an old message below that are displayed during the build of the exe.

ss.cpp: In member function ‘dvector model_parameters::process_comps(int, int, dvector&, dvector&, const dvector&, dvector&, dvector&)’: ss.cpp:25924:48: warning: ‘temp2’ may be used uninitialized in this function [-Wmaybe-uninitialized] more_comp_info(18)+=square(temp2-temp);

iantaylor-NOAA commented 10 months ago

@Rick-Methot-NOAA, Are you thinking of the files associated with the build-ss3-warnings github action?

This is the most recent one run on main https://github.com/nmfs-ost/ss3-source-code/actions/runs/7093011442 where I've downloaded the artifact at the bottom and attached here: warnings_ss.txt.

Unfortunately, there is nothing that would help us find a uninitialized variable. It looks like the warnings are all related to these lines where an extra set of {} would be helpful but perhaps not strictly necessary: https://github.com/nmfs-ost/ss3-source-code/blob/797760ff1ae10554a4f7729b643ab275125981c0/SS_write_report.tpl#L1386-L1391

Lastly, it looks like the build-ss3-warnings github action is passing even when that new warning appeared because of an out-of-date threshold for the number of warnings as discussed in this old issue that seems to have gotten buried: https://github.com/nmfs-ost/ss3-source-code/issues/452.

Rick-Methot-NOAA commented 10 months ago

Thanks. I saw that warning locally after I turned on -Wall compile flag. Fixing it does not solve the problem. I had been compiling locally using the canned ADMB routine but have switched to the verbose invocation so I can change more of the flags.

Rick-Methot-NOAA commented 10 months ago

I think the problem first appears here: C:\Users\Richard.Methot\Documents\GitHub\StockSynthesis_git\stock-synthesis> git checkout c3c49e3af2dbe94ec20a5b7a660bbe2ae183eadb HEAD is now at c3c49e3 Update SS_write_ssnew.tpl

This commit made changes to strings regarding the names for the data.ssnew files and differentiates between bootstrap files and other files, so logical that it could be cause of the problem.

The bootstrap file header looks like this:

#_Start_time: Mon Dec 11 16:12:14 2023
#_bootdata:_3
#C data file for simple example
#_bootstrap file: 1  irand_seed: 1877620999 first rand#: -1.39387

I suspect the problem is with creation of the 2nd line, which seems unnecessary and erroneously reports the overall count, not the boot count. OK to delete? The overflow leak causing crash goes away when I do. @kellijohnson-NOAA