Open James-Thorson-NOAA opened 2 years ago
I don't think I can do this without help from others on the SS3 team, but am happy to be involved.
I am all in support of anything that gets us away from the art involved in choosing input sample sizes and data weighting. Probably not much help on the logical code side of things, but could put some thought into user-interface and diagnostics once I get my head around what you are doing.
Jim, The interface should be a straightforward extension of current interface for selecting DM. I can assure that will happen.
yeah, it'll have two parameters instead of one ... if someone could make a dev branch or something with those named parameters, I could do a PR with the added code. Or is there some other preferred way to proceed ... perhaps instead I could just providing the code-snippet for the likelihood in this Issue thread?
branch MV-Tweedie-complike has been created and you can do PR to it. search in read_data and in SS_objfun for text = MV_Tweedie to get started adding code
I did not intend to fast-track this. I just wanted there to be a branch on which development could happen. Only git collision I foresee is with the request to rename the D-M parameters.
By read_data
I assume you mean "SS_readdata_330.tpl" ... I just opened this and can't easily make sense of it. So a few questions:
SS_objfun
but not SS_readdata_330
? if(Comp_Err_L(f)==1) dirichlet_Parm=mfexp(selparm(Comp_Err_Parm_Start+Comp_Err_L2(f)))*nsamp_l(f,i);
if(Comp_Err_L(f)==2) dirichlet_Parm=mfexp(selparm(Comp_Err_Parm_Start+Comp_Err_L2(f)));
...?
Jim, If you can start by working in SS_objfun.tpl. When you push that, I can see what parameters need to be created in SS_readdata330.tpl, and then named in ss_readcontrol330.tpl. Everything else should be handled automatically (reporting and write_ssnew).
The potential conflict noted by Rick is with issue #185 where there's a proposal to add information about which fleets use the parameter to the label, such as "ln(DM_theta)_Age_P7(2-3-4)" instead of the status-quo "ln(DM_theta)_7" for the 7th DM parameter. However, that change is low priority so can wait until MV-Tweedie is developed in parallel with some similar naming convention.
Is it possible, and would it ever be a good idea, to share one of these weighting parameters among a mix of length and age data? If so, we should dial back the proposal to add "Age" to the label.
OK, so I just did a PR ... there's plenty still to do, and the likelihood calculation no doubt would benefit from some extra eyes from people who better understand the standard pitfalls. In particular dtweedie
is probably not vectorized so I added a loop across the vector of comps. Also, it obviously needs help extracting parameters correctly (two for mvtweedie compared with one for DM).
https://github.com/nmfs-stock-synthesis/stock-synthesis/pull/278/files
Jim, nice job getting this together. In addition to the additional work that Jim has noted, we will need to install the dev
branch of ADMB to get the dtweedie
function https://github.com/admb-project/admb/blob/dev/src/linad99/dtweedie.cpp (which indeed seems to not be vectorized).
I'm too tired from a week of workshop to do any real work on this but I'll make a to-do list below to keep track of the next steps.
@k-doering-NOAA, it's probably a good idea to be testing ADMB 13 prior to it's release for 3.30.19 to make sure there are no compile errors or changes in model results. Would it be possible to add a github action to do so?
[edit: task list moved to top comment on this issue so it triggers a counter and also doesn't get buried]
Nice work Ian and Jim.
I will work on the readdata and readcontrol aspects.
For #219 , I intend to look into modularizing this convoluted section of code that has lots of repeated elements.
Rick, that sounds great.
Question on the workflow for a change like this (which I just assigned to the 3.30.20 milestone): there's a pull request associated with this issue (#277 from MV-Tweedie-complike
into main
). We could either leave that open until all this is done, or just create the PR once everything is in place.
@k-doering-NOAA, does leaving the PR open trigger more checks, or do we get those for the branch regardless? An open PR also gives people notifications when changes are pushed to that branch, which could be either good or bad depending on how many notifications we all get.
Perhaps I erred in doing it this way. I thought that use of a PR was the right way to initiate collaboration for a new feature branch. I did it mostly to give Jim something to work with and did not intend to push (sic) this issue to top of the priority stack for everyone, especially because we do not even have tweedie in our ADMB version yet.
Didn't realize that Jim was not yet part of the SS team, so just invited him into NOAA_contributors. Looked at the parameter I/O. This will be a bit complicated because current logic of counting parameters is completely dependent on one parameter and does not generalize easily to 2 parameters. so, this will take a bit of time. And when doing so, anticipate how to include generalized sizecomp DM and MVT options.
Good solution to add Jim to the contributors who can push changes within the organization. I think we can probably close the pull request and all contribute to the new branch until it's ready, then open a new PR. @k-doering-NOAA may have additional ideas on the optimal workflow when she logs in tomorrow.
Wow, lots of action on this topic, this is great! A few thoughts.
Just wanted to say, setting up the GHA to test out admb dev is on my to do list! However, we are working on release 3.30.19, so that will come first. I hope to get to the admb dev github action by next week.
Just wanted to say, setting up the GHA to test out admb dev is on my to do list! However, we are working on release 3.30.19, so that will come first. I hope to get to the admb dev github action by next week.
I did try this and was not able to successfully compile admb on github actions. It may be easiest to wait until there is a release for admb 13. In the meantime, I did compile it locally on my windows machine, so perhaps just compiling locally will be ok?
But we still could not continue development of MVT because any commit would trigger a gha which would fail when it tried to compile the MVT command.
I'm confused, could the failing jobs on the branch just be ignored for now? I think we would want them to pass before merging the branch into main, but I'm hoping ADMB 13 will be out before then, so I can modify our testing suite to use it.
What I mean is that we cannot proceed with development of a branch that includes a call like Do_MVT(). Even though we could compile SS3 locally using ADMB13, compiles using gha will fail because the do_MVT() routine does not exist in ADMB 12.3
I get that the do_MVT() routine doesn't exist in ADMB 12.3, so all the github actions would fail. I guess I still don't understand why that would impede development?
I suppose we still could do all MVT development locally, but we created gha partly so we could test frequently
Ah, ok, I see the concern. Maybe it would be best to wait on this until ADMB 13 (or its prerelease) comes out
Hi all,
I don't know where this landed, but I saw that ADMB-13 is now released, and also the MVTW paper is now accepted @ ICESJMS.
Is there anything I can do, or perhaps you could tell me when you've started switching over to ADMB-13?
Jim
There is a branch with a PR in which a placeholder to Tweedie is implemented, as well as several changes to how D-M is implemented. We have installed ADMB13 locally and tested SS3 with it. All looks good. I'll get back to you in a week with an update on when we are ready to move to next stage of implementing Tweedie.
Kathryn or Ian,
can either of you (or someone else) point me at the SS3 code snippet where the logical code for the mvtweedie would be added (presumably by me)?
I'm happy to take a stab at this in the next couple weeks.
Jim
Hi @James-Thorson-NOAA, The length comp likelihood for the MV-tweedie is implemented here (thanks to you): https://github.com/nmfs-stock-synthesis/stock-synthesis/blob/main/SS_objfunc.tpl#L376-L401. There are placeholders in that file for the age and generalized size-comp likelihoods. I'm assuming that @Rick-Methot-NOAA is planning to move the likelihood into a generalized function for all data types in this section: https://github.com/nmfs-stock-synthesis/stock-synthesis/blob/main/SS_miscfxn.tpl#L106-L122 (where the multinomial and Dirichlet-multinomial are now located).
I'm not sure what other changes are needed to implement this. I checked off the items for the data and control file pieces in the checklist at the top of this issue as it looks like they are complete.
Ian covered the new situation well. Suggest creating a new branch if you work on the full Tweedie implementation. Adding Elizabeth for awareness. @e-gugliotti-NOAA
Sorry, who's supposed to do something next?
Rick -- It looks like the mvtweedie likelihood is there for length-comps (except needing to uncomment the calculations). are you planning to move likelihood calls to some other place (where the code isn't duplicated)?
Ian -- are you willing to do a quick test of the uncommented likelihood for a length-comp model? I don't think I know how to compile the current SS3 code.
The Tweedie code will move into ss_miscfxn.tpl, similar to the function: Comp_logL_Dirichlet()
I will finish the implementation, then pass to Jim for testing.
The code is in main and I will work from there.
The FUNCTION in ss_miscfxn.tpl is: FUNCTION dvariable Comp_logL_Dirichlet(const double& Nsamp, const dvariable& dirichlet_Parm, const dvector& obs_comp, const dvar_vector& exp_comp) { dvariable logL; logL = sum(gammln(Nsamp obs_comp + dirichlet_Parm exp_comp)) - sum(gammln(dirichlet_Parm * exp_comp)); return (logL); }
OK thanks Rick!
PS: I posted a comment where I was confused about where to find files. I then deleted the comment because I realized it was due to working on my own fork which was out-of-date. I'm definitely struggling a bit to sort through the different file structures across branches and forks :0
I'm leading a paper with Tim Miller and Brian Stock, introducing a new "multivariate-Tweedie" likelihood for fitting age/length composition data. It is essentially a more flexible alternative to the Dirichlet-multinomial, and appears to have some useful pros and cons relative to the DM. It internally relies upon a
dtweedie
likeliood, which Johnoel is adding the ADMB-13, so it should be feasible to implement in SS3 by this summer.Does anyone have sufficient time and interest to work with me to get it implemented (both in logical code, and with a thoughtful user-interface) in SS3?
[Edit: task list below added by Ian on March 24.] To-do list for MV Tweedie likelihood based on comments above:
SS_objfun.tpl
and create a pull request (#278)dtweedie
function: https://github.com/admb-project/admb/issues/234SS_objfun.tpl
by @James-Thorson-NOAASS_readdata_330.tpl
SS_readcontrol_330.tpl
(two for mvtweedie compared with one for DM)