Make datum specific -logL available for leave-one-out

k-doering-NOAA commented 4 years ago

Imported from redmine, Issue #75174 Opened by @RickMethot on 2020-02-20 Status when imported: New

The initial inquiry from Cole is pasted below. I am moving it into the VLAB issue system for easier tracking towards implementation.

I'm exploring some options for Bayesian model selection/validation. An important component in the literature is WAIC and leave one out cross validation (LOO). Both of these approximate out-of-sample predictive error and would be really powerful tools to have for assessment scientists.

Without getting into the stats too much, what we need on the ss.tpl side is a vector of likelihood contributions for each datum. In a regression model this is trivial because you just have a yobs and yhat, so loglike=dnorm(yobs,yhat, sd, TRUE) is a vector representing what we want. Then for each posterior sample we calculate that and store it, and can pass that into R packages that calculate WAIC/LOO. Here is an example in the Stan language which we can easily adapt to ADMB (I can share this if you're interested).

It's trickier for assessments because of (1) the different data types and (2) for SS b/c of the general nature of the model. So for indices that'd be year (ll1=dlnorm), for comps that would be a whole year of all ages/lengths (ll2=dmultinom). I think we can just concatenate those together like c(ll1, ll2) [sorry for the horrific abuse of notation]. Or for more complex models with multiple fleets something like c(ll1,ll2, ..., lln). Then write that to a loglikes_poster.sso file that gets populated during mceval(), and can be read into R. The end is a matrix of log-likes with rows being posterior draw, and columns for each datum.

My question for you is whether there is an easy way to do this in SS? I can obviously hack a tpl for a specific model but ideally there would be a general solution.

k-doering-NOAA commented 4 years ago

comment from @iantaylor-NOAA on 2020-03-03: This request seems connected to closed issue #16273 which discusses the reporting of fleet-specific likelihood components in MCMC (which may require additional refinement) as well as the idea of automatically generating the full report file for each sample from MCMC (which would satisfy Cole's request in an inefficient way).

@kelli.johnson may have additional ideas stemming from the Pacific Hake assessment which continues to re-run the model without estimation for every parameter vector in posteriors.sso, which is a slow process.

k-doering-NOAA commented 4 years ago

comment from @RickMethot on 2020-03-03: Thanks Ian. I think what Cole wants is the info that is found now in report.sso in tables like Fif_LenComp and Index_2. But imagine a single vector that had an entry for each of >1000 CAAL observations.

Cole - what would you imagine the indexing of this vector to look like? It seems unwieldy. You would need to know the complex structure of the data in order to identify which element you were tweaking.

On Mon, Mar 2, 2020 at 4:23 PM vlab.redmine@noaa.gov wrote:

k-doering-NOAA commented 4 years ago

comment from @kellijohnson-NOAA on 2020-03-03: Thanks Ian for thinking of the hake assessment. We do in fact re-run each saved sample with bias adjustment ramp = 1.0 to generate a report file that can later be read in. Does anyone know how other Bayesian assessments written in ADMB get around this issue? Also, what other stocks turn on MCMC within an SS model?

k-doering-NOAA commented 4 years ago

comment from @RickMethot on 2020-03-03: Kelli and Ian, I think we need a new issue for this topic of re-running posteriors to get full report. I had no idea that was being done. Is there more that can be added to posteriors.sso to obviate that need?

On Tue, Mar 3, 2020 at 7:53 AM vlab.redmine@noaa.gov wrote:

k-doering-NOAA commented 4 years ago

comment from @k-doering-NOAA on 2020-03-03: @cole.monnahan, tagging you in case you want to provide feedback on this issue.

k-doering-NOAA commented 4 years ago

comment from cole.monnahan on 2020-03-03: kathryn.doering wrote:

@cole.monnahan, tagging you in case you want to provide feedback on this issue.

kelli.johnson wrote:

Thanks Ian for thinking of the hake assessment. We do in fact re-run each saved sample with bias adjustment ramp = 1.0 to generate a report file that can later be read in. Does anyone know how other Bayesian assessments written in ADMB get around this issue? Also, what other stocks turn on MCMC within an SS model?

Kelli: Doesn't SS turn off bias adjustment during MCMC anyway? Not sure I understand the question. I don't think other assessments are doing bias adjustment so it's not an issue.

Rick/Ian: The definition of what a single "datum" is, is complicated. Currently I am thinking that for example each year of comp data would be a single datum because it gets evaluated to a single NLL, and the data are all dependent within that vector of samples. Same with the indices, it would be by year. Currently we assume that all these components are independent (summing the NLLs) but really the index values and ages/lengths come from the same vessel and same data. I think this is fine for now. Basically, to guide thinking of how to structure the data we have to ask ourselves the question "how do you do leave one out cross-validation?"

What I think it looks like is the following. Assume Ift, Aft and Lft are the index, age and length data for a model. We ignore priors/penalties/hyperdistributions and only consider these data. Then we create a file where each column is a datum and row is an MCMC iteration or CV replicate.

In pseudo code: for(f in fleets) for(t in years) loo << NLL(Ift(f,t)) << "," ; [repeat for Aft & Lft]

and where NLL(Ift(f,t)) represents the NLL calculated for Ift(f,t). So you end up with a big file where the first block of columns represents the indices, the second the ages, and the third the lengths. This is pretty manageable b/c it's indexed by year. So maybe it would be a few hundred columns for a data-rich model?

With the generic nature of SS I think what you'd have to do is write this out for all possible data components, even those not used in the current model. Then when pulled into R it would be trivial to drop columns with all 0's (unused data). Or maybe there is an elegant solution like testing for initialized or not or something.

If done, there would be several advantages. In the Bayesian context this is equivalent to LOO for model selection (https://mc-stan.org/loo/articles/index.html) which could be useful and what I'd like to try. But furthermore it also provides a means for frequentist model validation and diagnostics. You can compare these values at the MLE across models (same data obviously) and see how structural changes affect the data much more thoroughly than the component totals. You can also identify high-leverage data. You could also do a likelihood profile and keep track of all of these components and see how different years were affected by breaking the NLL components down into a finer resolution. It's possible to detect conflict within a data set (e.g., early years on survey ages want M=.1, older want M=.2) that seems impossible without this fine-level information.

It's on my medium-term list to try and hack an SS tpl to see if I can get it to work and test it a bit, but happy to coordinate with the group if there is an interest in trying this more formally. Or maybe we'll try it first on some of our custom-built ADMB assessments and see how it comes out.

My biggest concern is that the likelihoods are not right so can't be interpreted statistically like this (same reason AIC fails).

k-doering-NOAA commented 4 years ago

comment from @RickMethot on 2020-03-03: Thanks Cole. Ican see how this could enable such detailed investigations.

For MCEVAL, we certainly want this file to be written to in append mode. For runs within a set of structured alternatives, it might also be useful to write in append mode, which wouldn't prevent someone from doing each run in a separate directory.

There will need to be a header row(s) with rather detailed information about contents of a column. That will be the hardest part of creating this output. Need: fleet, time, data_type, discard/retained, length_bin (for conditional age-at-length), size comp method (for generalized size comp), age_error method, and other things I have forgotten.

Also, mean length (or weight) at age is a data type, but the logL for each age is not stored, only the total across all ages. So this contrasts with conditional age-at-length where there is a logL for each length bin.

details, details,

On Tue, Mar 3, 2020 at 8:36 AM vlab.redmine@noaa.gov wrote:

k-doering-NOAA commented 4 years ago

comment from @iantaylor-NOAA on 2020-03-03: I think that a good way to move forward with this would be to test the approach on a simple model with the new output for just a subset of the data types. For example, the hake assessment in SS or a non-SS model at AFSC like the Bering Sea Pollock assessment, you would only need to worry about a small subset of the potential data types with no length comps included, just one fishing fleet, etc.

That would allow you to explore utility of the method without a big effort to generalize the output for all of the SS data types. If the method works well, and more models start moving to MCMC thanks to the efficiency of your adnuts contribution, then we could make that bigger effort in the future.

I know Max Cardinale has used MCMC with SS, at least as an additional exploration if not for a production assessment, but I'm not aware of any others within the U.S. other than Pacific Hake.

k-doering-NOAA commented 4 years ago

comment from cole.monnahan on 2020-03-03: ian.taylor wrote:

I think that a good way to move forward with this would be to test the approach on a simple model with the new output for just a subset of the data types. For example, the hake assessment in SS or a non-SS model at AFSC like the Bering Sea Pollock assessment, you would only need to worry about a small subset of the potential data types with no length comps included, just one fishing fleet, etc.

That would allow you to explore utility of the method without a big effort to generalize the output for all of the SS data types. If the method works well, and more models start moving to MCMC thanks to the efficiency of your adnuts contribution, then we could make that bigger effort in the future.

I know Max Cardinale has used MCMC with SS, at least as an additional exploration if not for a production assessment, but I'm not aware of any others within the U.S. other than Pacific Hake.

I acknowledge it's not an easy task, but I think it has a lot of potential and so wanted to get the idea floating out there. I agree with Ian that a few test cases on existing models are a good way forward. I plan on doing that with the SS model I inherited, and since I know the data structure I can modify the TPL to do what I need specifically for this case.

k-doering-NOAA commented 4 years ago

comment from @RickMethot on 2020-03-03: I concur.

On Tue, Mar 3, 2020 at 9:54 AM vlab.redmine@noaa.gov wrote:

Rick-Methot-NOAA commented 3 years ago

What about doing this as a R wrapper.

run SS with all data, get total logL
manipulate data file with R to change a datum's fleet code to -fleet
re-run SS from .par file and get the new logL

k-doering-NOAA commented 3 years ago

Tagging @Cole-Monnahan-NOAA yet again. Looks like this issue was last discussed around March 2020, so I imagine circumstances may have changed since it was originally opened!

Cole-Monnahan-NOAA commented 3 years ago

@Rick-Methot-NOAA then you'd take the difference in negLL to get the component for that single datum, then loop through all data points? That's certainly one way to do it and since there's no optimization it shouldn't be too bad. I would think that'd be a good initial way to test this on a single model.

nmfs-ost / ss3-source-code

Make datum specific -logL available for leave-one-out #67