questions/comments when calculating comp sample sizes

brianlangseth-NOAA commented 3 years ago

All of these are minor and more housekeeping, and things I noticed compiling data sample sizes.

Comm comps

@iantaylor-NOAA I see in #69 we are using unexpanded comp data for commercial fleets. I note that there are multiple renditions of the commercial fleets: lenCompN_comm as combined for fixed and trawl gear, and then lenCompN_FG and lenCompN_TW. Both appear to be based on Ntows. Can you confirm all are the same? Can you alert me to where you pull in unexpanded sample sizes within the comps for commercial gear?
I note that the current models (14.001) have 2021 data in them for the comps. Again, super minor, and I expect SS ignores given our endyr is 2020.

Slight differences with last (2017) assessment

I also note that in calculating sample sizes for the survey, I used the Stewart and Hamel bootstrapping approach wheres the previous model used Ntows as sample size. Likely trivial change from the last base, but probably resulting in more data weighting applied to our survey comps than before. I was going to note this in our issue for changes from the previous model but could find it. Thus documenting here for now.
The other change from the last assessment regarding sample size is there are very slight differences in sample sizes in the debHist data. These differ in six years and are at most 48 fish in one year). The reason is that the previous assessment kept fish designated as "kept" and we kept all fish, and we are excluding samples collected north of 40`10.

brianlangseth-NOAA commented 3 years ago

TOR (pg 42) says sample sizes are needed by state for fishery dependent data. Im unsure why this would be needed for commercial samples where we combine across states within the north model. Thoughts? Should I revise to incorporate state specific FG and TW sample sizes?

kellijohnson-NOAA commented 3 years ago

wrt the TOR, this is because each state has its own sampling program and typically the comps are expanded at the state level. If we do not provide them with the state-specific sampling number then they have no way to make informative decisions regarding recommendations for future sampling ... I think.

melissahaltuch-NOAA commented 3 years ago

That is correct Kelli

On Wed, Jun 23, 2021 at 8:35 AM Kelli Johnson @.***> wrote:

wrt the TOR, this is because each state has its own sampling program and typically the comps are expanded at the state level. If we do not provide them with the state-specific sampling number then they have no way to make informative decisions regarding recommendations for future sampling ... I think.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iantaylor-NOAA/Lingcod_2021/issues/89#issuecomment-866942999, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFP5YEDH7HIXWM3KQPO4SRDTUH5KBANCNFSM47GAH6YQ .

-- Melissa A. Haltuch, Ph.D Pronouns: she/her/hers Acting Fish Ecology Division Director, NWFSC, NOAA Fisheries Research Fishery Biologist, NOAA Fisheries

*University of Washington, School of Aquatic and Fishery Science, Associate Affiliate @. @.> 206.860.3480

iantaylor-NOAA commented 3 years ago

Good questions, @brianlangseth-NOAA. Here are a few answers

lenCompN_comm is the original expanded data.frame which is no longer used and lenCompN_FG and lenCompN_TW are the newer ones. However, as discussed in #69, the sample size calculation from the unexpanded code doesn't come through so the sample size (number of trips) is taken from the expanded table. See code at https://github.com/iantaylor-NOAA/Lingcod_2021/blob/70457d7b3dbefd80e337d25bf988034f161b049e/R/add_data.R#L333-L347
yes 2021 data would be automatically ignored by SS. If a future catch-only projection uses fixed catches in the forecast and retains 2020 as the end year (that's the approach used in 2019), then it would not be impacted by these either. But we might want to remove those comps from our final model anyway.
state-specific sample size totals would need to be recalculated from data-raw/lingcod_PacFIN_BDS.R
the Stewart and Hamel bootstrapping approach (hopefully accurately described in surveycomp.Rmd, link below) applies a single multiplier to the number of tows. I believe that as long as the data weighting isn't hitting an assumed upper bound of 1.0 then any scalar multiplier on sample sizes like this should be completely confounded with the weighting and make zero difference to the model results (although we could easily test this). Link to description: https://github.com/iantaylor-NOAA/Lingcod_2021/blob/097f2ae8386460e4a5305a0cbff1300cc145aef6/doc/surveycomp.Rmd#L20-L25

brianlangseth-NOAA commented 3 years ago

@iantaylor-NOAA I think I have it figured out. You use the unexpanded comp data (which is in numbers of fish and available from _FG and _TW) but obtain sample sizes, which are in number of tows from _comm? I had thought that because the expansion process adjusts both comp data and sample sizes, that we should use both if using expansion and neither if using unexpanded. Is it common to use unexpanded comps but expanded sample sizes? (I ask only to learn) and happy to discuss more once deadlines are past.

iantaylor-NOAA commented 3 years ago

I'm not very experienced with pacfin comps. The direction I got from @chantelwetzel-NOAA was that using nwfscSurvey::UnexpandedLFs.fn() (as implemented in the code linked below) would produce the number of observed fish as it knows nothing about the number of tows. However, I think number of tows is a better choice for input sample sizes, regardless of how many fish got sampled per tow, so I attempted to get the number of tows from the expanded comps.

I was assuming that the number of tows (or trips?) was the actual number of tows (or trips?) and not modified by the expansion. If the expansion modifies that value, then it would be good to figure out a better way to get the raw number of tows or trips (but not today).

https://github.com/iantaylor-NOAA/Lingcod_2021/blob/4567d392b28029bb60c0d140ca217eb49a268c1a/data-raw/lingcod_PacFIN_BDS.R#L313-L341

chantelwetzel-noaa commented 3 years ago

The number of trips produced by the writeComps function within PacFIN.Utilities is not impacted by the expansion and should align with the number to trips actually sampled in the data. Let me know if you have issues with the values being output by the writeComps function and I can dig in to show how it is exactly calculated.

brianlangseth-NOAA commented 3 years ago

Reopening: Before I was assuming the number of trips is needed only for commercial comps, but the TOR does not make that distinction. I do not know how to derive the number of trips from the recreational sample data. How might this be done, assuming it actually needs to?

chantelwetzel-noaa commented 3 years ago

I remember hearing a way that someone calculated number of trips for recreational data but I can't remember who said that or how it was done. I don't think if the number of trips for recreational fishing was missing from the document you would get dinged for not including this.

In regards to calculating commercial trips and samples by year for each fleet and state I have dug into the bds data to find a relatively easy way of getting this information. I think the easiest approach would be to provide the total samples for length or age by year rather than reporting them broken down by sex. I have confirmed that this approach gives you the same values as the writeComps function for trips and samples by gear. Here is the example code for lengths:

temp = bds.pacfin.n[!is.na(bds.pacfin.n$lengthcm), ] fish = aggregate(SEX ~ fishyr + fleet + state, temp, FUN = function(x) { length(x) } )

trips = aggregate(SAMPLE_NO~fishyr + fleet + state, temp, FUN = function(x) { length(unique(x)) } )

then the information by year, state, and gear just needs to be broken out into a data frame. I have confirmed that the values for trips and fish match what you get if you were to create the length comps as:

bds.pacfin.n.exp$SEX = "U" comps.n <- PacFIN.Utilities::getComps(Pdata = bds.pacfin.n.exp, Comps = "LEN")

lenCompN_comm <- PacFIN.Utilities::writeComps(inComps = comps.n, fname = "data/lenCompN_comm_all_unesexed.csv", lbins = info_bins$length, sum1 = TRUE, partition = 2, digits = 3, dummybins = FALSE)

I have only done the checking for the north area but am assuming/hoping that the results would also match for the south.

brianlangseth-NOAA commented 3 years ago

@andi-stephens-NOAA I notice that the sample size in lingcod_discard_comps.R coming from (Mendocino_Lincod_2021_WCGOP_Comps.xls) is by north and south and not by state. According to the TOR we need to provide sample sizes of composition data by nfish and ntrips and specified by state. I see in your .xls the number of fish, but is there a way to separate these out by state?

brianlangseth-NOAA commented 3 years ago

@chantelwetzel-noaa Thank you for the time resolving the state breakdown issue. I will explore the south and confirm sample sizes are comparable.

I do not plan to include ntrips for rec data, or for non-expanded comp data.

brianlangseth-NOAA commented 3 years ago

@chantelwetzel-noaa I cannot reproduce your ability to match ntows, even in the north. 9e8659e is my attempt. I am getting different number of tows in 1987 for FG. This has been discouraging, and Im afraid my ability to resolve is fading. Table currently has ntows and nfish for all but rec fleets, HKL, and Lam (which were not expanded, and I do not know how to get ntows from these). Thus, serviceable but not complete. I welcome help to get over hump - though happy to move forward on other things

brianlangseth-NOAA commented 3 years ago

@chantelwetzel-noaa Good news, I can reproduce your ability to match ntows in the north if I do not exclude NA lengths, and I can get south to behave too. There does appears to be issues with years where lengthcm's for a fleet are all NA. If these are removed then overall ntows can be matched to output from unsexed expansions. Unsure whether this issue is also contributing to challenges when separating by sex.

brianlangseth-NOAA commented 3 years ago

Should we need to update sample size tables for "Ntows" for fisheries independent data, John Harms, in email response to me on 7/29, suggests there are three ways to allocate for HKL

Hook level - basically zeroes and ones: year, site, drop, angler and hook
Drop level (15 hooks): year, site, and drop
Site level (75 hooks): year, site

Owen suggested I use sites for purposes of the sample size tables.

For Lam research, Laurel Lam, in email response to me on 7/29, states only one trip was done on any day, so aggregating by day is sufficient to obtain Ntrips.

kellijohnson-NOAA commented 3 years ago

If we aren't using tow for anything for Hook and Line then I don't think that we need to add anything. I am assuming we use trip level information for the sample size now.

brianlangseth-NOAA commented 3 years ago

@kellijohnson-NOAA - Ntow is in reference to the imprecise language from the TOR above. I used Nsite for the table.

kellijohnson-NOAA commented 3 years ago

Thanks @brianlangseth-NOAA for the explanation and I am willing to say that we don't have to have it :)

pfmc-assessments / lingcod

questions/comments when calculating comp sample sizes #89