nmfs-ost / ss3-source-code

The source code for Stock Synthesis (SS3).
https://nmfs-ost.github.io/ss3-website/
Creative Commons Zero v1.0 Universal
38 stars 16 forks source link

Multivariate-Tweedie for age/length compositional likelihood #266

Open James-Thorson-NOAA opened 2 years ago

James-Thorson-NOAA commented 2 years ago

I'm leading a paper with Tim Miller and Brian Stock, introducing a new "multivariate-Tweedie" likelihood for fitting age/length composition data. It is essentially a more flexible alternative to the Dirichlet-multinomial, and appears to have some useful pros and cons relative to the DM. It internally relies upon a dtweedie likeliood, which Johnoel is adding the ADMB-13, so it should be feasible to implement in SS3 by this summer.

Does anyone have sufficient time and interest to work with me to get it implemented (both in logical code, and with a thoughtful user-interface) in SS3?

[Edit: task list below added by Ian on March 24.] To-do list for MV Tweedie likelihood based on comments above:

iantaylor-NOAA commented 2 years ago

I don't think I can do this without help from others on the SS3 team, but am happy to be involved.

zandjyo commented 2 years ago

I am all in support of anything that gets us away from the art involved in choosing input sample sizes and data weighting. Probably not much help on the logical code side of things, but could put some thought into user-interface and diagnostics once I get my head around what you are doing.

Rick-Methot-NOAA commented 2 years ago

Jim, The interface should be a straightforward extension of current interface for selecting DM. I can assure that will happen.

James-Thorson-NOAA commented 2 years ago

yeah, it'll have two parameters instead of one ... if someone could make a dev branch or something with those named parameters, I could do a PR with the added code. Or is there some other preferred way to proceed ... perhaps instead I could just providing the code-snippet for the likelihood in this Issue thread?

Rick-Methot-NOAA commented 2 years ago

branch MV-Tweedie-complike has been created and you can do PR to it. search in read_data and in SS_objfun for text = MV_Tweedie to get started adding code

Rick-Methot-NOAA commented 2 years ago

I did not intend to fast-track this. I just wanted there to be a branch on which development could happen. Only git collision I foresee is with the request to rename the D-M parameters.

James-Thorson-NOAA commented 2 years ago

By read_data I assume you mean "SS_readdata_330.tpl" ... I just opened this and can't easily make sense of it. So a few questions:

  1. Would it still be helpful if I can make the changes in SS_objfun but not SS_readdata_330?
  2. If yes to Q1, could someone provide me with syntax to extract the parameters from the various inputs, e.g., analogous to how the DM does is using lines:
        if(Comp_Err_L(f)==1) dirichlet_Parm=mfexp(selparm(Comp_Err_Parm_Start+Comp_Err_L2(f)))*nsamp_l(f,i);
        if(Comp_Err_L(f)==2) dirichlet_Parm=mfexp(selparm(Comp_Err_Parm_Start+Comp_Err_L2(f)));

...?

Rick-Methot-NOAA commented 2 years ago

Jim, If you can start by working in SS_objfun.tpl. When you push that, I can see what parameters need to be created in SS_readdata330.tpl, and then named in ss_readcontrol330.tpl. Everything else should be handled automatically (reporting and write_ssnew).

iantaylor-NOAA commented 2 years ago

The potential conflict noted by Rick is with issue #185 where there's a proposal to add information about which fleets use the parameter to the label, such as "ln(DM_theta)_Age_P7(2-3-4)" instead of the status-quo "ln(DM_theta)_7" for the 7th DM parameter. However, that change is low priority so can wait until MV-Tweedie is developed in parallel with some similar naming convention.

Is it possible, and would it ever be a good idea, to share one of these weighting parameters among a mix of length and age data? If so, we should dial back the proposal to add "Age" to the label.

James-Thorson-NOAA commented 2 years ago

OK, so I just did a PR ... there's plenty still to do, and the likelihood calculation no doubt would benefit from some extra eyes from people who better understand the standard pitfalls. In particular dtweedie is probably not vectorized so I added a loop across the vector of comps. Also, it obviously needs help extracting parameters correctly (two for mvtweedie compared with one for DM).

https://github.com/nmfs-stock-synthesis/stock-synthesis/pull/278/files

iantaylor-NOAA commented 2 years ago

Jim, nice job getting this together. In addition to the additional work that Jim has noted, we will need to install the dev branch of ADMB to get the dtweedie function https://github.com/admb-project/admb/blob/dev/src/linad99/dtweedie.cpp (which indeed seems to not be vectorized).

I'm too tired from a week of workshop to do any real work on this but I'll make a to-do list below to keep track of the next steps.

@k-doering-NOAA, it's probably a good idea to be testing ADMB 13 prior to it's release for 3.30.19 to make sure there are no compile errors or changes in model results. Would it be possible to add a github action to do so?

[edit: task list moved to top comment on this issue so it triggers a counter and also doesn't get buried]

Rick-Methot-NOAA commented 2 years ago

Nice work Ian and Jim.
I will work on the readdata and readcontrol aspects. For #219 , I intend to look into modularizing this convoluted section of code that has lots of repeated elements.

iantaylor-NOAA commented 2 years ago

Rick, that sounds great.

Question on the workflow for a change like this (which I just assigned to the 3.30.20 milestone): there's a pull request associated with this issue (#277 from MV-Tweedie-complike into main). We could either leave that open until all this is done, or just create the PR once everything is in place.

@k-doering-NOAA, does leaving the PR open trigger more checks, or do we get those for the branch regardless? An open PR also gives people notifications when changes are pushed to that branch, which could be either good or bad depending on how many notifications we all get.

Rick-Methot-NOAA commented 2 years ago

Perhaps I erred in doing it this way. I thought that use of a PR was the right way to initiate collaboration for a new feature branch. I did it mostly to give Jim something to work with and did not intend to push (sic) this issue to top of the priority stack for everyone, especially because we do not even have tweedie in our ADMB version yet.

Rick-Methot-NOAA commented 2 years ago

Didn't realize that Jim was not yet part of the SS team, so just invited him into NOAA_contributors. Looked at the parameter I/O. This will be a bit complicated because current logic of counting parameters is completely dependent on one parameter and does not generalize easily to 2 parameters. so, this will take a bit of time. And when doing so, anticipate how to include generalized sizecomp DM and MVT options.

iantaylor-NOAA commented 2 years ago

Good solution to add Jim to the contributors who can push changes within the organization. I think we can probably close the pull request and all contribute to the new branch until it's ready, then open a new PR. @k-doering-NOAA may have additional ideas on the optimal workflow when she logs in tomorrow.

k-doering-NOAA commented 2 years ago

Wow, lots of action on this topic, this is great! A few thoughts.

  1. Agree it would be good to test ADMB 13 separately. Since it is not out yet, I don't think we should be using it for the 3.30.19 release, but I can set up a job that runs the dev version of admb for testing out MV tweedie changes.
  2. No need to leave the PR open, as I think all of the jobs should run when pushing to the branch anyway. We could close it, or mark the PR as "draft" to indicate it is not yet done. Either way works!
k-doering-NOAA commented 2 years ago

Just wanted to say, setting up the GHA to test out admb dev is on my to do list! However, we are working on release 3.30.19, so that will come first. I hope to get to the admb dev github action by next week.

k-doering-NOAA commented 2 years ago

Just wanted to say, setting up the GHA to test out admb dev is on my to do list! However, we are working on release 3.30.19, so that will come first. I hope to get to the admb dev github action by next week.

I did try this and was not able to successfully compile admb on github actions. It may be easiest to wait until there is a release for admb 13. In the meantime, I did compile it locally on my windows machine, so perhaps just compiling locally will be ok?

Rick-Methot-NOAA commented 2 years ago

But we still could not continue development of MVT because any commit would trigger a gha which would fail when it tried to compile the MVT command.

k-doering-NOAA commented 2 years ago

I'm confused, could the failing jobs on the branch just be ignored for now? I think we would want them to pass before merging the branch into main, but I'm hoping ADMB 13 will be out before then, so I can modify our testing suite to use it.

Rick-Methot-NOAA commented 2 years ago

What I mean is that we cannot proceed with development of a branch that includes a call like Do_MVT(). Even though we could compile SS3 locally using ADMB13, compiles using gha will fail because the do_MVT() routine does not exist in ADMB 12.3

k-doering-NOAA commented 2 years ago

I get that the do_MVT() routine doesn't exist in ADMB 12.3, so all the github actions would fail. I guess I still don't understand why that would impede development?

Rick-Methot-NOAA commented 2 years ago

I suppose we still could do all MVT development locally, but we created gha partly so we could test frequently

k-doering-NOAA commented 2 years ago

Ah, ok, I see the concern. Maybe it would be best to wait on this until ADMB 13 (or its prerelease) comes out

James-Thorson-NOAA commented 2 years ago

Hi all,

I don't know where this landed, but I saw that ADMB-13 is now released, and also the MVTW paper is now accepted @ ICESJMS.

Is there anything I can do, or perhaps you could tell me when you've started switching over to ADMB-13?

Jim

Rick-Methot-NOAA commented 2 years ago

There is a branch with a PR in which a placeholder to Tweedie is implemented, as well as several changes to how D-M is implemented. We have installed ADMB13 locally and tested SS3 with it. All looks good. I'll get back to you in a week with an update on when we are ready to move to next stage of implementing Tweedie.

James-Thorson-NOAA commented 2 years ago

Kathryn or Ian,

can either of you (or someone else) point me at the SS3 code snippet where the logical code for the mvtweedie would be added (presumably by me)?

I'm happy to take a stab at this in the next couple weeks.

Jim

iantaylor-NOAA commented 2 years ago

Hi @James-Thorson-NOAA, The length comp likelihood for the MV-tweedie is implemented here (thanks to you): https://github.com/nmfs-stock-synthesis/stock-synthesis/blob/main/SS_objfunc.tpl#L376-L401. There are placeholders in that file for the age and generalized size-comp likelihoods. I'm assuming that @Rick-Methot-NOAA is planning to move the likelihood into a generalized function for all data types in this section: https://github.com/nmfs-stock-synthesis/stock-synthesis/blob/main/SS_miscfxn.tpl#L106-L122 (where the multinomial and Dirichlet-multinomial are now located).

I'm not sure what other changes are needed to implement this. I checked off the items for the data and control file pieces in the checklist at the top of this issue as it looks like they are complete.

Rick-Methot-NOAA commented 2 years ago

Ian covered the new situation well. Suggest creating a new branch if you work on the full Tweedie implementation. Adding Elizabeth for awareness. @e-gugliotti-NOAA

James-Thorson-NOAA commented 2 years ago

Sorry, who's supposed to do something next?

Rick -- It looks like the mvtweedie likelihood is there for length-comps (except needing to uncomment the calculations). are you planning to move likelihood calls to some other place (where the code isn't duplicated)?

Ian -- are you willing to do a quick test of the uncommented likelihood for a length-comp model? I don't think I know how to compile the current SS3 code.

Rick-Methot-NOAA commented 2 years ago

The Tweedie code will move into ss_miscfxn.tpl, similar to the function: Comp_logL_Dirichlet()

Rick-Methot-NOAA commented 1 year ago

I will finish the implementation, then pass to Jim for testing.

The code is in main and I will work from there.

The FUNCTION in ss_miscfxn.tpl is: FUNCTION dvariable Comp_logL_Dirichlet(const double& Nsamp, const dvariable& dirichlet_Parm, const dvector& obs_comp, const dvar_vector& exp_comp) { dvariable logL; logL = sum(gammln(Nsamp obs_comp + dirichlet_Parm exp_comp)) - sum(gammln(dirichlet_Parm * exp_comp)); return (logL); }

James-Thorson-NOAA commented 1 year ago

OK thanks Rick!

PS: I posted a comment where I was confused about where to find files. I then deleted the comment because I realized it was due to working on my own fork which was out-of-date. I'm definitely struggling a bit to sort through the different file structures across branches and forks :0