nmfs-ost / ss3-source-code

The source code for Stock Synthesis (SS3).
https://nmfs-ost.github.io/ss3-website/
Creative Commons Zero v1.0 Universal
37 stars 17 forks source link

Re-running each MCMC saved sample to get a report file in Hake assessment #70

Closed k-doering-NOAA closed 2 years ago

k-doering-NOAA commented 4 years ago

Imported from redmine, Issue #75754 Opened by @k-doering-NOAA on 2020-03-03 Status when imported: New

This first came up in #75174, but is a separate topic from the original issue being discussed there.

Kelli noted:

Thanks Ian for thinking of the hake assessment. We do in fact re-run each saved sample with bias adjustment 
ramp = 1.0 to generate a report file that can later be read in. Does anyone know how other Bayesian assessments 
written in ADMB get around this issue? Also, what other stocks turn on MCMC within an SS model?

and Rick responded:

Kelli and Ian,
I think we need a new issue for this topic of re-running posteriors to get
full report. I had no idea that was being done. Is there more that can be
added to posteriors.sso to obviate that need?
k-doering-NOAA commented 4 years ago

comment from @kellijohnson-NOAA on 2020-03-03: I am uncertain the exact values that we are after but currently unavailable in the posteriors file. I will look and document here.

And, to answer Kathryn's question in the previous thread ... we re-run the model for each MCMC draw to mimic the results that would have been found but a full report file was not saved, which is why we turn off bias adjustment because we are trying to match MCMC conditions but using no estimation in the MLE framework. Basically, we just want a copy of the Report.sso file for each MCMC draw that was saved.

k-doering-NOAA commented 4 years ago

comment from @iantaylor-NOAA on 2020-03-03: Here's a long illustrated answer to the question of what quantities are used in the hake assessment that aren't in the standard MCMC output.

My memory is that the approach of re-running the model to get a full Report.sso file (and later CompReport.sso as well) for each of the MCMC samples started 4 or 5 years ago as a way to get expected values from the survey to make this figure: [image: image.png] However, once we had all those report files, that allowed calculation of the median and uncertainty intervals for the expected age comp values for each year and data source that went into this figure (where the intervals are too tiny to see in almost all cases, except for instance the 1980 cohort in 1983 and 1986, but the medians are more representative of the MCMC results than the MLE proportions). [image: image.png]

And also the selectivity by age and year for both fishery and survey which allows representation of the uncertainty in the time-varying fishery selectivity (which is much larger than the uncertainty in the baseline selectivity shown in 1990 in the figure below): [image: image.png]

The index values are already now available under derived quantities, but those were only 25 values per MCMC sample anyway. The expected proportions at age by year and fleet is about 900 values per MCMC sample. Selectivity by age and year is about 200 unique values given that selectivity is assumed constant from age 6 onward.

There may be other values I'm forgetting or that have been added since my time on the hake team because once you have access to everything, it's easy to think of additional diagnostics that would be good to convert from MLE to MCMC.

As with the other issue about MCMC output (#75174), solving this problem for hake, which is a relatively simple model, would be easier than generalizing to the needs of other assessments.

One strategy that I've been pondering the is option to re-run these models simultaneously in the cloud, extract the quantities of interest also in the cloud and download an additional summary file containing just what's needed. Teresa A'mar also looked into calling the write_bigoutput() function within the mceval phase to create all these files automatically without an external R script to facilitate that which would be more efficient (though still leading to lots of big files).

On Tue, Mar 3, 2020 at 9:40 AM vlab.redmine@noaa.gov wrote:

k-doering-NOAA commented 4 years ago

comment from @iantaylor-NOAA on 2020-03-03: I tried including images by replying by email rather than filling in text on the VLab redmine site, but one got lost and the other two don't seem to have appeared in the email notification, so now I'm trying again by using the redmine form and attaching them as PNG files and referencing within the text (by including filename within two exclamation marks).

Figure showing index uncertainty from MCMC results: !hake_survey_uncertainty.png!

Figure showing age comp uncertainty from MCMC: !hake_age_comp_uncertainty.png!

Figure showing selectivity uncertainty from MCMC: !hake_selectivity_uncertainty.png!

k-doering-NOAA commented 4 years ago

comment from @kellijohnson-NOAA on 2020-03-03: Thanks Ian for the list. I am not super keen on having the only available option be running something in the cloud. I think that it should be accessible on individual desktops with the added benefit of it having the potential that someone could run it on the cloud if they want to set that up.

k-doering-NOAA commented 4 years ago

comment from @iantaylor-NOAA on 2020-03-03: Kelli, Could you speak to roughly how long it takes to get the extra Report files and extract the info from them? My memory is that it's less than an hour, certainly much faster than running the MCMC in the first place, although that obviously depends on how many samples you're using. It seems possible that even if the samples used to calculate reference points, quotas, etc. are based on a larger MCMC sample, the figures could just use fewer than 1000 samples as a way to speed things up. -Ian

On Tue, Mar 3, 2020 at 11:30 AM vlab.redmine@noaa.gov wrote:

k-doering-NOAA commented 4 years ago

comment from @kellijohnson-NOAA on 2020-03-03: I just had Chris check and it was 35 minutes for 2000 samples.

k-doering-NOAA commented 4 years ago

comment from @RickMethot on 2020-03-05: Good exploration so far. Here are some other ideas:

  1. Issue #46175 would provide for custom control on which parts of report.sso are generated. This could allow for quicker reporting of just what you need.
  2. Turning on the writing of those reports during MCEVAL should be easy.
  3. Writing those reports in append mode (like the data.ss_new file) would also be easy. Or creating a way to write each as a named file.
  4. An alternative could be to augment the cumreport.sso which writes in append mode for each run.
Rick-Methot-NOAA commented 3 years ago

@kellijohnson-NOAA We need to decide if it is better to find a way to write each MCEVAL's complete, but customized, report.sso to a separate folder or to create a new report function with newly specified output that is called only during MCEVAL

kellijohnson-NOAA commented 3 years ago

Infrastructure wise, I am guessing that just writing Report.sso for every iteration would be the easiest, but I am unsure is this is limiting time and size wise?

Rick-Methot-NOAA commented 3 years ago

time and space is not insignificant. But customized selection of individual elements of report.sso is already a SS feature. Seems feasible to create capability so that a different (leaner) set of elements gets written during MCEVAL.

Rick-Methot-NOAA commented 3 years ago

the mkdir() function in C++ will create sub-directories. Need to investigate whether or not it is cross-platform (iOS and linux) @nschindler-noaa

kellijohnson-NOAA commented 3 years ago

Are you thinking of suggesting that users specify which sections of report.sso are written and if no specifications are given then the full report file would be written?

Rick-Methot-NOAA commented 3 years ago

Correct. SS3 now has capability to specify (in starter) which reports get written. We could leverage that capability into user-specified selection of which reports get written in mceval phase. Suggest that mceval defaults to just the slim set of reports and users could augment to their desired set. Looking into how to direct each set of reports to new sub-directory. Cross-platform capability may be tougher.

On Thu, Oct 14, 2021 at 11:43 AM Kelli Johnson @.***> wrote:

Are you thinking of suggesting that users specify which sections of report.sso are written and if no specifications are given then the full report file would be written?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nmfs-stock-synthesis/stock-synthesis/issues/70#issuecomment-943624048, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPV4IFY7S2BSSNX5SCXT7DUG4QFNANCNFSM4TLVRGOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

k-doering-NOAA commented 3 years ago

Hmmm, why would creating new folders not work on linux or mac? If need be, we can test using github actions and/or use the mac mini

nschindler-noaa commented 3 years ago

Okay. the syntax for creating new folders is mkdir("geeksforgeeks", 0777)

seen in the following code: if (mkdir("geeksforgeeks", 0777) == -1) cerr << "Error : " << strerror(errno) << endl;

else
    cout << "Directory created";
Rick-Methot-NOAA commented 3 years ago

Thanks Neal. Then need to change to that as the output directory. I found the code below and perhaps could modify from there:

ifdef _WIN32

include

// MSDN recommends against using getcwd & chdir names

define cwd _getcwd

define cd _chdir

else

include "unistd.h"

define cwd getcwd

define cd chdir

endif

include

char buf[4096]; // never know how much is needed

int main(int argc , char** argv) {

if (argc > 1) { std::cout << "CWD: " << cwd(buf, sizeof buf) << std::endl;

// Change working directory and test for success
if (0 == cd(argv[1])) {
  std::cout << "CWD changed to: " << cwd(buf, sizeof buf) << std::endl;
}

} else { std::cout << "No directory provided" << std::endl; }

return 0; }

Rick-Methot-NOAA commented 3 years ago

OK. Cycling back to the main question before starting any coding: Is it better or easier to implement the MCEVAL reports as: a) specific report name, like report_mcev.sso, in current folder b) report.sso in new subfolder named mcev
c) report.sso written in append mode (like data.ss_new) like a separator named mcev****

In all cases, a user-specified subset of all report.sso elements could be implemented. Inclusions of reports other than report.sso should also be considered.

nschindler-noaa commented 3 years ago

And my question is this: Is it easier to handle the results as a single file, separate files in the same directory, or separate files in separate directories?

nschindler-noaa commented 3 years ago

This is good. Here, I've added a mkdir capability as well. Please note, that if someone is building on windows with gcc, they need the linux definitions. If the directory already exists, the mkdir does nothing.

Sorry, in the code, mkdir should be mkd and chdir should be cd. (I set up the silly defines and don't even use them!)

ifdef _WIN32

include

// MSDN recommends against using getcwd & chdir names

define cwd _getcwd

define mkd _mkdir

define cd _chdir

else

include "unistd.h"

define cwd getcwd

define mkd mkdir

define cd chdir

endif

include

char buf[4096]; // never know how much is needed int main(int argc, char *argv[]) { if (argc > 1) { std::cout << "Current directory: " << cwd(buf, sizeof buf) << std::endl; // Make new directory if (0 == mkdir(argv[1])) std::cout << "Made directory: " << argv[1] << std::endl; // Change working directory and test for success if (0 == chdir(argv[1])) { std::cout << "Current directory changed to: " << cwd(buf, sizeof buf) << std::endl; } } else { std::cout << "No directory provided" << std::endl; } }

Rick-Methot-NOAA commented 3 years ago

I will do an experiment with the mkdir, chdir code to be sure that a chdir during a run does not totally mess with what ADMB is expecting. But not yet 100% convinced that this is the best approach among the three I outlined above.

iantaylor-NOAA commented 3 years ago

I don't want to put all this progress on mkdir to waste, but maybe it would be easier to just write Report_mceval1.sso, Report_mceval2.sso, etc. I commented above (https://github.com/nmfs-stock-synthesis/stock-synthesis/issues/70#issuecomment-722534271) that CompReport.sso is also being used form each MCMC sample to posterior medians and intervals of the fit to the age comps, but if it's only those two files needed from each MCMC, sample, it seems reasonable to write them all to the same directory.

As for what gets included, it seems reasonable to use whatever the user specifies for the "detailed output" setting in the starter file and apply that both to the main Report.sso associated with the MLE and any extra MCMC-related files. If a user wanted less detail in the MCMC versions, they could just change the starter file setting and re-run the -mceval step.

I'm assuming the hake team could adapt to whatever format gets created as long as all the info they use is present, but @kellijohnson-NOAA would know better.

Rick-Methot-NOAA commented 3 years ago

Thanks Ian. I do see this as a viable option. I worry that the chdir() will not work with ADMB during a run because ADMB is writing others items in append mode for each mceval call (i.e. the write to posteriors.sso which is in the original directory). Regarding the need for compreport.sso - let's start a separate conversation on this. I think the r4ss processing of compreport.sso is redundant with what SS3 is doing and reporting in report.sso to the fit_age_comp section. We can tweak that write to fit_age_comp to make it do exactly what is needed.

iantaylor-NOAA commented 3 years ago

Hi Rick, That's reasonable to not worry about CompReport.sso for now. The image below is from the beginning of this thread on VLab (https://vlab.noaa.gov/redmine/issues/75754#note-3, but the images didn't get transferred to github).

I think getting either the posterior medians or the 95% intervals requires one row of output per age bin as in CompReport.sso rather than one row per age comp observations as in FIT_AGE_COMPS. However, the intervals are trivially small in almost all cases and my memory is that the posterior medians are so close to the MLE that I'm guessing it would be fine to skip this step and not bother with CompReport.sso files from all the mceval steps. This figure based on CompReport.sso was originally created only because those files were already available from the inefficient status-quo workflow used for hake.

hake_age_comp_uncertainty

Rick-Methot-NOAA commented 3 years ago

@nschindler-noaa Hi Neal, Can you write out the c++ code to check to see if subdirectory ssnew exists in the current directory? I have already gotten SS3 to be able to write the data*.ss_new files into that directory if it exists (see below). But if that directory has not been created by the user, I want the files to go into the current directory. something like: if(subdirectory ssnew exists) subdir="./ssnew"; else subdir="";


for (Nudat=1;Nudat<=N_nudata;Nudat++) { if(Nudat==1) { report1.open("./ssnew/data_echo.ss_new"); ..... } else if(Nudat==2) { report1.open("./ssnew/data_expval.ss_new"); ..... } else { sprintf(anystring, "%d", Nudat-2); anystring2="./ssnew/databoot"+anystring+".ss_new"; report1.open(anystring2);

Rick-Methot-NOAA commented 3 years ago

Apologies Neal, I had some time and found something that works. Claims to be cross-platform.

include <sys/types.h>

include <sys/stat.h>

struct stat pathinfo;
adstring pathname;
if( stat( "./ssnew", &pathinfo ) != 0 )
{
  pathname="";
}
else
  {
    pathname="./ssnew/";
  }

for (Nudat=1;Nudat<=N_nudata;Nudat++) { if(Nudat==1) { anystring=pathname+"data_echo.ss_new"; report1.open(anystring); ...

Rick-Methot-NOAA commented 3 years ago

Now that the protocols have been worked out, we can do the same for the .sso files. If ./sso exists, then write there, else write to the current directory. Also the logic to create report.mce*** can mirror this for ssnew. r4ss ssmse, and ss3sim programmers will need to create a conditional read of data.ssnew if an older version of SS3, or the new filenames if >=SS3.30.19. Does that seem OK?

Rick-Methot-NOAA commented 3 years ago

In order to get report and compreport for each mceval run I propose to:

0.08 # MCMC output detail: integer part (0=default; 1=adds obj func components); and decimal part (added to SR_LN(R0) on first call to mcmc) to add : integer=2 means that report.sso and compreport.sso will be written for each mceval iteration

k-doering-NOAA commented 2 years ago

Discussions about subfolders and files names should now take place in #226 ; this issue should now be only used to discuss the original issue of saving each MCMC sample to a get report file

kellijohnson-NOAA commented 2 years ago

@Rick-Methot-NOAA regarding files needed for hake. Chris Grandin reminded me today that we also use compreport.sso for Composition Database in addition to report.sso.