Build hydradata input object for Georges Bank data for mskeyruns.

sgaichas commented 1 year ago

@gavinfay input files have been added with 967751c6cf4c2268cfe815658a90add5bfbc9fed and the model now runs but produces nans. I've fixed several issues with the input data but it probably needs fresh eyes now if you have some time.

sgaichas commented 1 year ago

here are the current input survey and fishery time series. before we even look at lengths, I wonder if the really high catches first thing combined with some relatively low surveys are an issue?

plotdat-1

plotdat-2

sgaichas commented 1 year ago

@gavinfay I can work on this today. The growthprob_phi calculation is producing nans so I'll start there.

sgaichas commented 1 year ago

I changed to growth_type 3 instead of 4, which sets growthprob_phi to 0 if a length bin exceeds the vonB_Linf entered. No more nans but due to the mismatch between estimated Linf (the mean max length, not the max length) and observed max length, many species do not make it to the highest size bin (probability of leaving bin 4 or even 3 in a 5 bin model = 0).

We may want to consider adjustments so that there is a nonzero probability of growing into the largest length bins given this constraint. A hack would be increasing vonB_Linf to the max observed length but this will change growth from the observed fit if K isn't altered too. On the other hand, a curve through 5 bins isn't much of a curve...

gavinfay commented 1 year ago

Thanks @sgaichas. Yes, those early high catches from the foreign fleet are problematic. The surplus production over that period is incompatible with the survey time series. In part this is due to stock dynamics being driven by exploitation pattern not just on Georges Bank for some of our stocks (e.g. the pelagics, also silver hake and spiny dogs). In previous fits to MS-PROD, we got around this (also known as sticking one's fingers in one ears and saying 'lalalala') by removing the initial few years from the dataset. It might make sense to do that here, otherwise I suspect we will have a series of weird recruitment patterns estimated.... (or we allow for additional mortality but hope to have something simple to start with...). Re growth, ah I knew our bandaid would rear its head again. One option could be to estimate the Linf and k, because we have information in the size comp data, but probably troublesome as you say to estimate a growth curve with only 5 size bins.

sgaichas commented 1 year ago

Well @gavinfay I am definitely missing something. I've fixed all the data errors I can find, but I cannot get a run where anything survives past the first timestep, even with catch set to 0 and the isprey matrix all 0.

sgaichas commented 1 year ago

OK, new input files starting in 1978 and with 0 replacing -999 in comps data (c001e535acd2a1ccf791cde36861777506ba6e47) and associated R list object (17d29c226ba7f3e9e2fc3ad7ccb537c6568b102a) are up. This may make life easier:

sgaichas commented 1 year ago

@gavinfay on the growth issue, I can confirm that the simulated dataset also produces 0 growthprob_phi values for lengthbins < max for many species. It is especially pronounced in the 10 bin model, with 0 probability of growing into the next bin for as many as the 5 largest bins for a couple of species!

gavinfay commented 1 year ago

@sgaichas Recruits are not being populated because t % Nstepsyr is always 0 when there is only 1 time step per year, thus is always the last time step of the year and no first time step. Seems like there is a lot of places where this condition is used, so should/could we force the GB data to have more than 1 time step per year?

sgaichas commented 1 year ago

Aha! Thanks for finding that. An option for forcing multiple steps per year with the data might be reverting to the original model bins, which were smaller for small sizes with one very large bin at the end. Faster growth through that small bin made multiple steps per year. The disadvantage of this approach is there will rarely be length data in the smallest bins, but maybe that is ok as long as there is some length data in something other than the last big bin. This will leave that bug in place though. Could we add Nstepsyr == 1 || in front of the t % Nstepsyr == 1 && yrct <= Nyrs or would that break? I haven't thought it all the way through obviously... It would be or the whole previous conditional

gavinfay commented 1 year ago

Hmm. I think adding the OR statement sounds like a decent fix for now. Will try and see what happens.

BTW: Stock Synthesis now has a predator module!

gavinfay commented 1 year ago

The OR statement seemed to do the trick. Noticing that the InpN sample sizes for the comp data are huge. These look like they are Nfish for the fishery size comps, not sure about the surveys (looks like it could be Ntow in some case?). Can the fishery comps be the Number of samples instead? Setting all to 100 for time being.

sgaichas commented 1 year ago

Great!

Yes, those comps inpN probably are Nfish for all; I will make an mskeyrun issue for those and see how quickly we can fix them. Setting to 100 sounds like a reasonable temporary fix.

The other data correction not currently included is to apply a discard mortality rate (< the implied 100%). Catches for dogfish and skates are currently overestimated because I haven't corrected for this. I will add an issue to mskeyrun for that as well. If those two species are getting killed off too quickly scaling the catch down a bit would be reasonable.

gavinfay commented 1 year ago

Aha. Will keep that in mind. The values for the survey indices are very large also, are these estimates of total biomass? I would have expected them to be stratified mean kg per tow/hectare, but if they are meant to be total biomass estimates that helps think about catchabilities.

sgaichas commented 1 year ago

They have been input as total biomass, but mskeyrun data is in mean kg/tow. So I can do whichever is easiest. I started with total B thinking it would be equivalent to how the simulated data was set up.

sgaichas commented 1 year ago

@gavinfay we are in the process of adding ntows and ntrips for comps data. We are still doing a review, but they are likely to result in inpNs lower than 100. Survey mean n tows sampling GB lengths for the time series:

# A tibble: 20 × 3
# Groups:   LongName [10]
   LongName            SEASON meanntows
   <chr>               <chr>      <dbl>
 1 Atlantic cod        FALL       20.9 
 2 Atlantic cod        SPRING     40.5 
 3 Atlantic herring    FALL       11.2 
 4 Atlantic herring    SPRING     26.8 
 5 Atlantic mackerel   FALL        8.98
 6 Atlantic mackerel   SPRING     13.5 
 7 Goosefish           FALL       15.5 
 8 Goosefish           SPRING      9.21
 9 Haddock             FALL       28.8 
10 Haddock             SPRING     34.3 
11 Silver hake         FALL       54.1 
12 Silver hake         SPRING     33.8 
13 Spiny dogfish       FALL       41.9 
14 Spiny dogfish       SPRING     30.4 
15 Winter flounder     FALL       29.3 
16 Winter flounder     SPRING     27.9 
17 Winter skate        FALL       46.9 
18 Winter skate        SPRING     42.3 
19 Yellowtail flounder FALL       31.8 
20 Yellowtail flounder SPRING     36.4

Mean number of fishery trips sampled for the time series (herring number under investigation, and winter skate may be all skates):

# A tibble: 10 × 2
   LongName            meanNtrip
   <chr>                   <dbl>
 1 Atlantic cod             74.0
 2 Atlantic herring          1  
 3 Atlantic mackerel        27.7
 4 Goosefish                29.4
 5 Haddock                  56.8
 6 Silver hake              78.0
 7 Spiny dogfish            27.7
 8 Winter flounder          79.1
 9 Winter skate            239. 
10 Yellowtail flounder      62.3

gavinfay commented 1 year ago

OK, thanks! (yes 100 seems like a large sample size in general)

I was running into some strange behavior, where there was no prediction of catch in the first year, and I think I have satisfied myself this is due to the order of operations meaning that N() for a given time step is actually the N at the end of the time step (it is deprecated by that time step's mortality). Thus, within a loop that goes from t=2, the first calculation of catch is for t=2 and thus there is no mortality (of any kind) during time step 1 (the initial population size). I "think" this is ameliorated when there is more than 1 time step per yr but the order of operations is a little different than what I was expecting, being used to thinking about N being the population size at the beginning of the time step (i.e. requires last time step's mortality to calculate it). I might be misreading things, but I think I need to think about how to do this for the first time step. It's possibly a simple fix of looping from t=1 but forcing the initial N update to take from the same time step instead of the previous time step in that first timestep. (i.e. don't run calc_update_N() in time t=1).

sgaichas commented 1 year ago

Good catch--your logic sounds right to me. I cannot remember whether there was any logic to how I originally coded the order of operations. Probably there wasn't...

gavinfay commented 1 year ago

@sgaichas Somehow indicator_fishery_q got its fleets switched for the GB data so that fleet 1 was assigned as the pelagic fleet even though these data are in fleet 2 in the time series.

sgaichas commented 1 year ago

Thank you for catching that! I will fix

sgaichas commented 1 year ago

Fleet ordering for the indicator_fishery_q input has been fixed in the code that makes the data object here (reordering of select statement) https://github.com/thefaylab/hydradata/commit/c4833957e3b5721d3239125f643f8cf2e2943560

Before rebuilding the full data object I am waiting for final review on our updated ntows and ntrips sample sizes for comps.

sgaichas commented 1 year ago

Updated inputs fix the fishery order in indicator_fishery_q, have inpN based on ntows (survey) and ntrips (fishery) on Georges Bank, and specify recdev vector length Nyrs-1. 265092a9196f52dc0792b2aca98ce4e1b09f9444

A correction to fishery length comp proportions is they are now based on numbers at length, prior to this they had been proportions resulting from expansion to catch biomass at length.

gavinfay commented 1 year ago

@sgaichas, are the survey time series data adjusted for Albatross/Bigelow differences (e.g. via calibration coefficients) during index generation?

sgaichas commented 1 year ago

Correct, this time series is adjusted, and is in Albatross units. I also have them separate if you want to try fitting separately.

gavinfay commented 1 year ago

Great! Maybe at some point yes I think that would be good to separate and try it. Not going to prioritize that for now though. Thank you!

sgaichas commented 1 year ago

@gavinfay latest push includes .dat .pin and -ts.dat files with 10 fleets where species=fleet; filename GB-input/hydra_sim_GB_5bin_1978_10F

I have spot checked data but have not tried a run yet.

note that this also has fishery length comps with no borrowing across years. so for instance goosefish has no early length data now.

we are still double checking survey length comps, for now have not changed.

sgaichas commented 1 year ago

This may run better: bc3cc89999b10396d0a32dff347ac772fbbb4c77 reformats for current .tpl with M1, oF and oFdev phases added to .dat and M1 matrix removed, and ln_annM1, ln_otherFood and otherFooddevs added to .pin file

gavinfay commented 1 year ago

Commit 8ac9233 adds working potentially estimable vulnerability for each prey item to each predator. Parameterized as relative to other food, and is a vector logit_vuln that is length equal to the sum of isprey matrix (for GB ~ 35). High value (e.g. 10) and no estimation (vuln_phase in .dat file) preserves the same behavior as prior. Initial values for logit_vuln in pin file occur following F_devs and before fishsel_pars.

sgaichas commented 10 months ago

@gavinfay @emilylil Updated data with simplified 3 fleet structure and new inputs for Nov 2023 tpl built in hydradata d0aa2a43aea61b70dc6c9fdfb48fd8173ac59a65 and pushed here 8c75c9772a325bf21d1509d7e870971dffdae341 with new filenames GB-input/hydra_sim_GB_new5bin_1978_3fleets Runs on my computer, but let me know if something looks off.

thefaylab / hydra_sim

Build hydradata input object for Georges Bank data for mskeyruns. #22