Closed k-doering-NOAA closed 2 years ago
comment from @kellijohnson-NOAA on 2020-03-03: I am uncertain the exact values that we are after but currently unavailable in the posteriors file. I will look and document here.
And, to answer Kathryn's question in the previous thread ... we re-run the model for each MCMC draw to mimic the results that would have been found but a full report file was not saved, which is why we turn off bias adjustment because we are trying to match MCMC conditions but using no estimation in the MLE framework. Basically, we just want a copy of the Report.sso file for each MCMC draw that was saved.
comment from @iantaylor-NOAA on 2020-03-03: Here's a long illustrated answer to the question of what quantities are used in the hake assessment that aren't in the standard MCMC output.
My memory is that the approach of re-running the model to get a full Report.sso file (and later CompReport.sso as well) for each of the MCMC samples started 4 or 5 years ago as a way to get expected values from the survey to make this figure: [image: image.png] However, once we had all those report files, that allowed calculation of the median and uncertainty intervals for the expected age comp values for each year and data source that went into this figure (where the intervals are too tiny to see in almost all cases, except for instance the 1980 cohort in 1983 and 1986, but the medians are more representative of the MCMC results than the MLE proportions). [image: image.png]
And also the selectivity by age and year for both fishery and survey which allows representation of the uncertainty in the time-varying fishery selectivity (which is much larger than the uncertainty in the baseline selectivity shown in 1990 in the figure below): [image: image.png]
The index values are already now available under derived quantities, but those were only 25 values per MCMC sample anyway. The expected proportions at age by year and fleet is about 900 values per MCMC sample. Selectivity by age and year is about 200 unique values given that selectivity is assumed constant from age 6 onward.
There may be other values I'm forgetting or that have been added since my time on the hake team because once you have access to everything, it's easy to think of additional diagnostics that would be good to convert from MLE to MCMC.
As with the other issue about MCMC output (#75174), solving this problem for hake, which is a relatively simple model, would be easier than generalizing to the needs of other assessments.
One strategy that I've been pondering the is option to re-run these models simultaneously in the cloud, extract the quantities of interest also in the cloud and download an additional summary file containing just what's needed. Teresa A'mar also looked into calling the write_bigoutput() function within the mceval phase to create all these files automatically without an external R script to facilitate that which would be more efficient (though still leading to lots of big files).
On Tue, Mar 3, 2020 at 9:40 AM vlab.redmine@noaa.gov wrote:
comment from @iantaylor-NOAA on 2020-03-03: I tried including images by replying by email rather than filling in text on the VLab redmine site, but one got lost and the other two don't seem to have appeared in the email notification, so now I'm trying again by using the redmine form and attaching them as PNG files and referencing within the text (by including filename within two exclamation marks).
Figure showing index uncertainty from MCMC results: !hake_survey_uncertainty.png!
Figure showing age comp uncertainty from MCMC: !hake_age_comp_uncertainty.png!
Figure showing selectivity uncertainty from MCMC: !hake_selectivity_uncertainty.png!
comment from @kellijohnson-NOAA on 2020-03-03: Thanks Ian for the list. I am not super keen on having the only available option be running something in the cloud. I think that it should be accessible on individual desktops with the added benefit of it having the potential that someone could run it on the cloud if they want to set that up.
comment from @iantaylor-NOAA on 2020-03-03: Kelli, Could you speak to roughly how long it takes to get the extra Report files and extract the info from them? My memory is that it's less than an hour, certainly much faster than running the MCMC in the first place, although that obviously depends on how many samples you're using. It seems possible that even if the samples used to calculate reference points, quotas, etc. are based on a larger MCMC sample, the figures could just use fewer than 1000 samples as a way to speed things up. -Ian
On Tue, Mar 3, 2020 at 11:30 AM vlab.redmine@noaa.gov wrote:
comment from @kellijohnson-NOAA on 2020-03-03: I just had Chris check and it was 35 minutes for 2000 samples.
comment from @RickMethot on 2020-03-05: Good exploration so far. Here are some other ideas:
@kellijohnson-NOAA We need to decide if it is better to find a way to write each MCEVAL's complete, but customized, report.sso to a separate folder or to create a new report function with newly specified output that is called only during MCEVAL
Infrastructure wise, I am guessing that just writing Report.sso for every iteration would be the easiest, but I am unsure is this is limiting time and size wise?
time and space is not insignificant. But customized selection of individual elements of report.sso is already a SS feature. Seems feasible to create capability so that a different (leaner) set of elements gets written during MCEVAL.
the mkdir() function in C++ will create sub-directories. Need to investigate whether or not it is cross-platform (iOS and linux) @nschindler-noaa
Are you thinking of suggesting that users specify which sections of report.sso are written and if no specifications are given then the full report file would be written?
Correct. SS3 now has capability to specify (in starter) which reports get written. We could leverage that capability into user-specified selection of which reports get written in mceval phase. Suggest that mceval defaults to just the slim set of reports and users could augment to their desired set. Looking into how to direct each set of reports to new sub-directory. Cross-platform capability may be tougher.
On Thu, Oct 14, 2021 at 11:43 AM Kelli Johnson @.***> wrote:
Are you thinking of suggesting that users specify which sections of report.sso are written and if no specifications are given then the full report file would be written?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nmfs-stock-synthesis/stock-synthesis/issues/70#issuecomment-943624048, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPV4IFY7S2BSSNX5SCXT7DUG4QFNANCNFSM4TLVRGOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hmmm, why would creating new folders not work on linux or mac? If need be, we can test using github actions and/or use the mac mini
Okay. the syntax for creating new folders is mkdir("geeksforgeeks", 0777)
seen in the following code: if (mkdir("geeksforgeeks", 0777) == -1) cerr << "Error : " << strerror(errno) << endl;
else
cout << "Directory created";
Thanks Neal. Then need to change to that as the output directory. I found the code below and perhaps could modify from there:
// MSDN recommends against using getcwd & chdir names
char buf[4096]; // never know how much is needed
int main(int argc , char** argv) {
if (argc > 1) { std::cout << "CWD: " << cwd(buf, sizeof buf) << std::endl;
// Change working directory and test for success
if (0 == cd(argv[1])) {
std::cout << "CWD changed to: " << cwd(buf, sizeof buf) << std::endl;
}
} else { std::cout << "No directory provided" << std::endl; }
return 0; }
OK. Cycling back to the main question before starting any coding: Is it better or easier to implement the MCEVAL reports as:
a) specific report name, like report_mcev.sso, in current folder
b) report.sso in new subfolder named mcev
c) report.sso written in append mode (like data.ss_new) like a separator named mcev****
In all cases, a user-specified subset of all report.sso elements could be implemented. Inclusions of reports other than report.sso should also be considered.
And my question is this: Is it easier to handle the results as a single file, separate files in the same directory, or separate files in separate directories?
This is good. Here, I've added a mkdir capability as well. Please note, that if someone is building on windows with gcc, they need the linux definitions. If the directory already exists, the mkdir does nothing.
Sorry, in the code, mkdir should be mkd and chdir should be cd. (I set up the silly defines and don't even use them!)
// MSDN recommends against using getcwd & chdir names
char buf[4096]; // never know how much is needed int main(int argc, char *argv[]) { if (argc > 1) { std::cout << "Current directory: " << cwd(buf, sizeof buf) << std::endl; // Make new directory if (0 == mkdir(argv[1])) std::cout << "Made directory: " << argv[1] << std::endl; // Change working directory and test for success if (0 == chdir(argv[1])) { std::cout << "Current directory changed to: " << cwd(buf, sizeof buf) << std::endl; } } else { std::cout << "No directory provided" << std::endl; } }
I will do an experiment with the mkdir, chdir code to be sure that a chdir during a run does not totally mess with what ADMB is expecting. But not yet 100% convinced that this is the best approach among the three I outlined above.
I don't want to put all this progress on mkdir to waste, but maybe it would be easier to just write Report_mceval1.sso, Report_mceval2.sso, etc. I commented above (https://github.com/nmfs-stock-synthesis/stock-synthesis/issues/70#issuecomment-722534271) that CompReport.sso is also being used form each MCMC sample to posterior medians and intervals of the fit to the age comps, but if it's only those two files needed from each MCMC, sample, it seems reasonable to write them all to the same directory.
As for what gets included, it seems reasonable to use whatever the user specifies for the "detailed output" setting in the starter file and apply that both to the main Report.sso associated with the MLE and any extra MCMC-related files. If a user wanted less detail in the MCMC versions, they could just change the starter file setting and re-run the -mceval step.
I'm assuming the hake team could adapt to whatever format gets created as long as all the info they use is present, but @kellijohnson-NOAA would know better.
Thanks Ian. I do see this as a viable option. I worry that the chdir() will not work with ADMB during a run because ADMB is writing others items in append mode for each mceval call (i.e. the write to posteriors.sso which is in the original directory). Regarding the need for compreport.sso - let's start a separate conversation on this. I think the r4ss processing of compreport.sso is redundant with what SS3 is doing and reporting in report.sso to the fit_age_comp section. We can tweak that write to fit_age_comp to make it do exactly what is needed.
Hi Rick, That's reasonable to not worry about CompReport.sso for now. The image below is from the beginning of this thread on VLab (https://vlab.noaa.gov/redmine/issues/75754#note-3, but the images didn't get transferred to github).
I think getting either the posterior medians or the 95% intervals requires one row of output per age bin as in CompReport.sso rather than one row per age comp observations as in FIT_AGE_COMPS. However, the intervals are trivially small in almost all cases and my memory is that the posterior medians are so close to the MLE that I'm guessing it would be fine to skip this step and not bother with CompReport.sso files from all the mceval steps. This figure based on CompReport.sso was originally created only because those files were already available from the inefficient status-quo workflow used for hake.
@nschindler-noaa Hi Neal, Can you write out the c++ code to check to see if subdirectory ssnew exists in the current directory? I have already gotten SS3 to be able to write the data*.ss_new files into that directory if it exists (see below). But if that directory has not been created by the user, I want the files to go into the current directory. something like: if(subdirectory ssnew exists) subdir="./ssnew"; else subdir="";
for (Nudat=1;Nudat<=N_nudata;Nudat++) { if(Nudat==1) { report1.open("./ssnew/data_echo.ss_new"); ..... } else if(Nudat==2) { report1.open("./ssnew/data_expval.ss_new"); ..... } else { sprintf(anystring, "%d", Nudat-2); anystring2="./ssnew/databoot"+anystring+".ss_new"; report1.open(anystring2);
Apologies Neal, I had some time and found something that works. Claims to be cross-platform.
struct stat pathinfo;
adstring pathname;
if( stat( "./ssnew", &pathinfo ) != 0 )
{
pathname="";
}
else
{
pathname="./ssnew/";
}
for (Nudat=1;Nudat<=N_nudata;Nudat++) { if(Nudat==1) { anystring=pathname+"data_echo.ss_new"; report1.open(anystring); ...
Now that the protocols have been worked out, we can do the same for the .sso files. If ./sso exists, then write there, else write to the current directory. Also the logic to create report.mce*** can mirror this for ssnew. r4ss ssmse, and ss3sim programmers will need to create a conditional read of data.ssnew if an older version of SS3, or the new filenames if >=SS3.30.19. Does that seem OK?
In order to get report and compreport for each mceval run I propose to:
0.08 # MCMC output detail: integer part (0=default; 1=adds obj func components); and decimal part (added to SR_LN(R0) on first call to mcmc) to add : integer=2 means that report.sso and compreport.sso will be written for each mceval iteration
[ ] then at bottom of ss_proced.tpl modify code near here:
if(mceval_phase()) get_posteriors(); } // end doing of the calculations if(mceval_phase() || initial_params::mc_phase==1) { No_Report=1; // flag to skip output reports after MCMC and McEVAL }
[ ] Note that Dynamic Bzero, SPR profile and the global_MSY are only called in the FINAL section as the last action, so they will not appear in the mceval reports. should be feasible to add dynamic bzero, but not sure
Discussions about subfolders and files names should now take place in #226 ; this issue should now be only used to discuss the original issue of saving each MCMC sample to a get report file
@Rick-Methot-NOAA regarding files needed for hake. Chris Grandin reminded me today that we also use compreport.sso for Composition Database in addition to report.sso.
Imported from redmine, Issue #75754 Opened by @k-doering-NOAA on 2020-03-03 Status when imported: New
This first came up in #75174, but is a separate topic from the original issue being discussed there.
Kelli noted:
and Rick responded: