add test that runs mcmc regression test on a stock synthesis model

nmfs-ost / ss3-source-code

The source code for Stock Synthesis (SS3).

https://nmfs-ost.github.io/ss3-website/

Creative Commons Zero v1.0 Universal

36 stars 16 forks source link

add test that runs mcmc regression test on a stock synthesis model #455

Open k-doering-NOAA opened 2 years ago

k-doering-NOAA commented 2 years ago

currently, all the regression tests of stock synthesis use MLE, but sometimes stock synthesis models are run using mcmc. Set up a test that runs mcmc and checks outputs. This may not necessarily require the creation of a new test model; for example, the hake example already within ss-test-models may work well. A first step would be to use this test just to check that the i/o is still working (can the model be run and there aren't any weird nas or major outliers in the output), but we could refine to make this better in the future.

Rick-Methot-NOAA commented 2 years ago

model runtime will be a major issue with mcmc. I think it needs to be just long enough to test the I/O features. run with input random seed so results are replicable.

k-doering-NOAA commented 2 years ago

Thanks @Rick-Methot-NOAA , I forgot that point. I think for github actions there is a runtime limit of 6 hours? This may have changed since I last looked, though.

Rick-Methot-NOAA commented 2 years ago

A mcmc run that just tests I/O can be much shorter than 6 hours, especially for hake. You will need to create a special routine to test comparability with past models. Perhaps do something like: Do 1000 mcmc draws, then in MCEVAL get output for every 10th draw. Then run comparison using the 3rd (or any specific numbered) MCEVAL sample each time.

iantaylor-NOAA commented 2 years ago

Test of mcmc (using adnuts) was added by @k-doering-NOAA back in November 2021: https://github.com/nmfs-stock-synthesis/workflows/blob/main/.github/workflows/run-ss3-mcmc.yml It doesn't really compare the output to a reference value, but I don't think that level of testing is needed at this stage.

k-doering-NOAA commented 2 years ago

That mcmc job is just checking that the sampling is repeatable and hence the NUTS algorithm in ADMB is compatible with ss3 ; I think this should be left open since the goal is different (a regression test to check model results).

Rick-Methot-NOAA commented 2 years ago

I think a MCMC-NUTS test is reasonable to include. @k-doering-NOAA Why does the issue need to remain open if the current test is performing as expected?

k-doering-NOAA commented 2 years ago

The test set up now is different than the one this issue describes.

e-perl-NOAA commented 1 year ago

@k-doering-NOAA it looks like you marked this done in your "KD tasks" projects, it appears that this is indeed done, can you confirm?

k-doering-NOAA commented 1 year ago

No, this was not done - it's possible I marked it off on my project board realizing I was not going to have time for it before changing jobs!

Rick-Methot-NOAA commented 1 year ago

I think it a good idea to have such a test and I volunteer hake as the subject.

iantaylor-NOAA commented 1 year ago

I like the idea of an mcmc test too.

I suspect that even with settings random seed, small changes in SS3 will make it difficult to get a precise match in a short chains, so we would either want to have a low bar for similarity in a short chain or a more precise test for similarity in the summary statistics for a longer chain.

Perhaps @kellijohnson-NOAA could look at MCMC for a recent hake model (which could replace Hake_2018 in the set in https://github.com/nmfs-stock-synthesis/test-models/tree/main/models if that would be useful) and propose a length of chain (and even commands to start the adnuts run) that balances duration with enough convergence to be robust to small changes in the model. I'm guessing that we could just compare the median for one or two quantities of interest as well as some measure of the variability of their posterior distributions.

Rick-Methot-NOAA commented 1 year ago

A test of convergence would be great, but I would be satisfied with (1) a test of I/O; (2) MCEVAL output from the Nth call.

iantaylor-NOAA commented 1 year ago

There is already an mcmc test that checks that the NUTS algorithm in ADMB is compatible with SS3 Here's the script that it uses: https://github.com/nmfs-stock-synthesis/stock-synthesis/blob/main/.github/workflows/run-ss3-mcmc.yml#L68-L94. I think that could be expanded to add MCEVAL and a check of the output which aren't included yet. I suppose if the chain is short and a small change in SS3 leads to a failed test, we can figure that out when the time comes and extend the chain or otherwise modify the test.

I don't think it's fair to @e-gugliotti-NOAA to put this all on her given the complexity of running MCMC and the NWFSC folks are busy with assessments right now, so maybe we can transfer this issue to the stock-synthesis and put off further work on it until the fall. I just assigned myself and if @kellijohnson-NOAA is interested, she could assign herself too.

Rick-Methot-NOAA commented 1 year ago

Agreed. No rush on this one. I am deep in trying to concoct tests with time-varying biology to figure out consequences of a,b formulation for spawner-recruitment

e-perl-NOAA commented 1 year ago

@iantaylor-NOAA, I would also like to assist on this one just to learn a bit more about running MCMC and MCEVAL.