likelihood profiles - Githubissues

kellijohnson-NOAA commented 3 years ago

I am making a csv file with the parameter names, low, high, and step sizes for likelihood profiles and I am wondering how we profile over M when there is female and male M, is it standard to do a profile for each parameter or should they both be turned on at the same time?

melissahaltuch-NOAA commented 3 years ago

I have usually just profiled over female M, and let male remain estimated.

On Fri, Jun 18, 2021 at 8:32 AM Kelli Johnson @.***> wrote:

I am making a csv file with the parameter names, low, high, and step sizes for likelihood profiles and I am wondering how we profile over M when there is female and male M, is it standard to do a profile for each parameter or should they both be turned on at the same time?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iantaylor-NOAA/Lingcod_2021/issues/72, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFP5YEDSXH722BTZU3KQTP3TTNRJ3ANCNFSM4654PJ4Q .

-- Melissa A. Haltuch, Ph.D Pronouns: she/her/hers Acting Fish Ecology Division Director, NWFSC, NOAA Fisheries Research Fishery Biologist, NOAA Fisheries

*University of Washington, School of Aquatic and Fishery Science, Associate Affiliate @. @.> 206.860.3480

kellijohnson-NOAA commented 3 years ago

Was that because you parameterized male as an offset or even when they are non-linked parameters?

melissahaltuch-NOAA commented 3 years ago

I've done this with and without male as an offset. Even if they are explicitly linked as parameters I expect that female and male M are pretty highly correlated. I think that we explored alternatives in M parameterization in the petrale model years ago and didn't see much difference, for what it's worth.

On Fri, Jun 18, 2021 at 8:40 AM Kelli Johnson @.***> wrote:

Was that because you parameterized male as an offset or even when they are non-linked parameters?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/iantaylor-NOAA/Lingcod_2021/issues/72#issuecomment-864125023, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFP5YEG7S6IGGOPXKVJR2ILTTNSGFANCNFSM4654PJ4Q .

-- Melissa A. Haltuch, Ph.D Pronouns: she/her/hers Acting Fish Ecology Division Director, NWFSC, NOAA Fisheries Research Fishery Biologist, NOAA Fisheries

*University of Washington, School of Aquatic and Fishery Science, Associate Affiliate @. @.> 206.860.3480

iantaylor-NOAA commented 3 years ago

Good idea to add a CSV file. Here are some options for 2-sex M profiles:

profile female M and leave male M estimated approach described by @melissahaltuch-NOAA above. I agree that it will likely work fine even without an offset because of correlation among the parameters: fixing M for one sex provides a lot of information about the other sex (which could be tested by seeing how stable the ratio of the two Ms is across the profile values).
profile over a grid of both parameters see https://github.com/r4ss/r4ss/issues/224 for info on how to do this, but it with slow models like ours, I think it would take too long
profile over female M with male M set at each point to a fixed ratio relative to female M (either via an offset or a second vector provided as in https://github.com/r4ss/r4ss/issues/224 again). However, this approach prevents you from seeing how the best fit ratio might change across M values.

I would vote for starting with Option 1 as suggested by @melissahaltuch-NOAA and exploring others approaches as needed. I'm also happy keeping males and females and independent parameters rather than using offsets.

kellijohnson-NOAA commented 3 years ago

great, i will set it up for (1)

kellijohnson-NOAA commented 3 years ago

You can now, at least in theory, run all the diagnostics available in nwfscDiags using run_investigatemodel(). (1) Edit the par values that are run here, if you need to. (2) See the example for how to run, just change basemodelname b/c I cheated and had it in my workspace.

Thank you to @chantelwetzel-noaa for making these functions.

I am going to leave this issue open until we decide on profile values.

kellijohnson-NOAA commented 3 years ago

Profiling is current happening. The regularization further confirmed that M is correlated for the N model more o than the S model.

iantaylor-NOAA commented 3 years ago

The nwfscDiag package has been a huge help for lingcod diagnostics. Thank you @chantelwetzel-NOAA

However, some profiles are showing some jerky patterns (fig below from 2021.s.012.004) suggesting points in the profile are not converging properly. I see that the nwfscDiag package has a rerun_profile_vals() function but I'm not sure how it's used.

There's a long discussion of how to deal with this stuff at https://github.com/nwfsc-assess/nwfscDiag/issues/3 but it's not obvious to me what the best steps are for Lingcod:

use options within nwfscDiag to get better convergence (not sure what how rerun_profile_vals() connects to the other tools, probably because I'm not focused enough figure it out on my own,
re-run profiles outside the package and then connect them to the package for purposes of summarizing and plotting, or
do both the profiles and figures with custom code.

Luckily @chantelwetzel-noaa has offered to provide some input on this.

piner_panel_SR_LN(R0)

chantelwetzel-noaa commented 3 years ago

I would suggest trying the rerun_profile_vals() as the first step. I admit the documentation of this function is likely sparse since I created it a bit on the fly during assessment season. The function will pull the original profile run results, rerun select values, and then recreate all of the profile output. You can call this function as:

rerun_profile_vals(mydir = file.path(mydir, base_name), para_name = "Size_DblN_peak_OR_Recreational(2)", run_num = c(1,6), data_file_nm = "2021_or_copper.dat") where mydir is the general model directory, base_name is the base model name located in the mydir folder, para_name is the parameter that you want to run select values again for, run_num corresponds the run number from the original profile of the value that needs to be rerun, and data_file_nm is the name of the data file. Identifying the correct run_num can be a bit tricky since the profiling in the package is down going down from the base model, returning to the base, and then stepping up from the base model parameter. The best way to identify the correct run number is to identify the non-converged parameter values and the use the numbered Report file to identify the correct corresponding run number. Unfortunately, there may be times when you still fail to get a converged profile run for select values even after re-running with this function. At that point it may be best to attempt to get the specific runs to converge by hand.

iantaylor-NOAA commented 3 years ago

Thank you @chantelwetzel-noaa. All this makes sense and is very helpful. We'll let you know how it turns out. A sparsely documented function is much more useful than no function at all.

chantelwetzel-noaa commented 3 years ago

I think and important addition to the package would be one that can easily be called by a user to summarize all runs in a specific profile folder. This would be useful for the situations where a poor poor assessor is stuck running select profiles by hand. This could be fairly easy to create so please keep me posted on how the rerun function does for lingcod.

chantelwetzel-noaa commented 3 years ago

I just remembered a trick for easily figuring out the run number. The csv file that is created post profile run called "profile_your parameter_results.csv" provides the run numbering for each value along the profile along with the likelihood values. "Bad" runs that need to be re-run can be easily identified by looking here.

iantaylor-NOAA commented 3 years ago

@kellijohnson-NOAA, how should we provide profile results for inclusion in the document? Commit them within the folder where they were created and point to it, write a function to move them into a central location, or what?

Unfortunately I didn't do the math last night to multiply the number of steps in our finer-scale profiles by 50 minutes for 2021.n.015.004 to run with Hessian, so only have an M profile for so far (shown below). The good news is that there don't seem to be convergence issues and the length comps and indices which have the most information about M are in agreement (and both say M should be high). The dome-shaped signal for the age data is a little confusing, but profiles of other parameters may help interpret that. The large number of years and large sample sizes for Rec_WA ages explains the influence that source has but the dome-shaped pattern is a little puzzling.

The large number of fleets means that the profile figures are a little crowded so we may also want to create custom figures for the final report with 2 panels per page or shrink the margins or something to get more space in each fig. piner_panel_NatM_uniform_Fem_GP_1

kellijohnson-NOAA commented 3 years ago

The most sustainable way is to just commit the csv and png files within the folder they live in. Because the writeup knows what folder the base model is in and then the sensitivity names are just appended to that. This way we won't make a mistake in forgetting to copy things and I think that there is utility in being able to see results of profiles for certain models whether or not they eventually became the base model.

iantaylor-NOAA commented 3 years ago

Great, that makes sense. I'm taking a break now but will commit the figs in a bit.

kellijohnson-NOAA commented 3 years ago

Three lines in command window

git add .\2021.s.014.001_esth_profile_*\*.png
git add .\2021.s.014.001_esth_profile_*\*.csv
git commit -m "Sensitivity figures and csvs"

Just did this for the southern model in 280c8cc

kellijohnson-NOAA commented 3 years ago

Fixed profiles are in f1c4766

iantaylor-NOAA commented 3 years ago

I added a row to diagpars.csv in commit f8a63b27f9c9fe73a76ab1c57a3db946fa886bae to add likelihood profiles for L_at_Amax_Fem. The range (100 to 114 by 2 = 8 total steps) should cover the what we've seen in recent model estimates for north and south but may need to be refined once we see the results.

This profile seems useful to add for our base models because it would be useful to know what data sources are informing growth in the north and south and also that parameter was fixed for the north model in the 2017 assessment.

@kellijohnson-NOAA, thanks for volunteering to run this additional profile for the base models in addition to a sensitivity as noted #118.

kellijohnson-NOAA commented 3 years ago

Profiles added with 1d8371f

pfmc-assessments / lingcod

likelihood profiles #72