DATA 3 Survey Data - Githubissues

brianlangseth-NOAA commented 3 years ago

Issues listed in the data spreadsheet for fishery independent surveys

[x] Indices via VAST (Kelli)
[x] Lengths (Brian) Data placed on drive (Lingcod2021 > data > survey).
- [x] Code for surveys accessible on data warehouse (WCGBTS, triennial, AFSC slope, NWFSC slope)
- [x] Code from John Field (H&L)
[x] Ages (Brian) Data placed on drive (Lingcod2021 > data > survey).
- [x] Code for surveys accessible on data warehouse (WCGBTS, triennial, AFSC slope, NWFSC slope)
- [x] Updated Laurels H&L ages onto H&L scripts

Surveys on the table include:

WCGBTS
Triennial
NWFSC H&L (John Wallace has done standardization in the past, EJ and Tanya working on a new method for Vermillion, revisit this later)
Ages and lengths from Laurel's thesis (check for link to/overlap with CCFRP)

Maybe:

ODFW survey(s) (Ali Whitman will be sharing data via Google Drive)
CCFRP California Collaborative Fisheries Research Program/Project since 2007 (Melissa Monk has done a basic standardization for Gopher and talked about a more complex analysis for Lincod & Vermillion)

Not on the table:

Slope surveys (too deep for Lingcod)
IPHC Survey (depths & hook sizes determined in 2017 to not be right)
WDFW H&L survey (Theresa Tsou says they will not be able to provide this year and we don't have time to work on it anyway)

kellijohnson-NOAA commented 3 years ago

[x] @brianlangseth-NOAA can you double check the pulled survey ages against this file provided by Patrick that summarizes what ages should be available? He just wanted to ensure that all ages are available to us. Thanks -Kel

brianlangseth-NOAA commented 3 years ago

@kellijohnson-NOAA I updated your file with two columns. The first column is the number of non-NA ages I get when pulling from the data warehouse, based on

bio.WCGBTS <- nwfscSurvey::PullBio.fn(Name = utils_name("common"), SurveyName = "NWFSC.Combo") table(bio.WCGBTS$Year,is.na(bio.WCGBTS$Age))

The second column is the difference. It is highlighted in yellow

Thus, we have the new data in however we are missing small numbers of samples in nearly every year. Perhaps those have non standard length, weight, or age, in which case the PullBio.fn would exclude them? That or something weird is going on.

kellijohnson-NOAA commented 3 years ago

Yes, very weird. Here is a raw download from the api where the sample numbers match much more closely. Can you look at the differences?

FYI - this is the URL generated from nwfscSurvey::PullBio.fn, which indicates some filtering is being done

https://www.webapps.nwfsc.noaa.gov/data/api/v1/source/trawl.individual_fact/selection.json?filters=project=Groundfish%20Slope%20and%20Shelf%20Combination%20Survey,station_invalid=0,performance=Satisfactory,depth_ftm>=30,depth_ftm<=700,field_identified_taxonomy_dim$common_name|=[%22lingcod%22],year>=1980,year<=5000&variables=project,trawl_id,common_name,year,vessel,pass,tow,datetime_utc_iso,depth_m,weight_kg,ageing_laboratory_dim$laboratory,length_cm,width_cm,sex,age_years,otosag_id,latitude_dd,longitude_dd,standard_survey_age_indicator,standard_survey_length_or_width_indicator,standard_survey_weight_indicator,operation_dim$legacy_performance_code

kellijohnson-NOAA commented 3 years ago

I went in via browser() and looked at the full data pull without the filters and here is a summary which matches what you have within a tolerance of 2 ages for a given year

All of the filters seem reasonable to me

depth greater than or equal to 30 fathoms
depth less than 700 fathoms
performance == "Satisfactory"
station_invalid == 0

So, I think we are good!

brianlangseth-NOAA commented 3 years ago

There were two main differences I could find. The first is that the api file includes samples with project name "Groundfish Triennial Shelf Survey".

The other is the field "target_station_design_dim.stn_invalid_for_trawl_date_whid" which, if there is a non-zero value, are not included in the pull from the warehouse. I dont know what this field means though. Moreover, there are 213 (of the 510 data points in the api that aren't in the warehouse pull) where this field is zero so its not a home run.

Of these 510 records, 253 have ages. Doesn't match with the difference between what patrick is providing though.

I see your updated comment so Im fine moving forward with what is available from the data warehouse if you are.

kellijohnson-NOAA commented 3 years ago

The Triennial data were included just because of the year range that I selected, so that is fine.

I am guessing that the target_station_design... maps to station_invalid and @chantelwetzel-noaa is already filtering for that.

Patrick warned that a few years would be off by one or two and he didn't seem concerned about it, so yeah we are good to go. I will probably make a issue in the nwfscSurvey package that we should potentially provide a summary of what is removed by year or at least return the UrlText to the user so they can access the full data set if they want. Adding an attribute to the data of url would work so that a data frame can still be returned rather than a list.

iantaylor-NOAA commented 3 years ago

I added more about the filtering to https://github.com/nwfsc-assess/nwfscSurvey/issues/45, but am fine with the filtering as it stands. We have many many samples for lingcod and not that many got filtered. For future assessments, if we're keeping these filters as they stand, we should have the age readers avoid reading any of those samples.

brianlangseth-NOAA commented 3 years ago

@iantaylor-NOAA and @kellijohnson-NOAA A few questions (as checkboxes) regarding the survey data.

Do we want to do CAAL for the hook and line ages. In the last assessment there were no ages for HKL, just lengths. @LaurelLam-NOAA provided those and Ive done age comps.
- [x] Should I do CAAL for HKL as well?
The survey depth strata Im using for the comps are slightly different than the last assessment. Both the combo and (as far as I can tell) the triennial used 55-183, 183-400, and 400-1280 in the past (see FRAM\Assessments\Archives\Lingcod\Lingcod_2017\Data\NWFSCSurvey\ExploreBioData\Lengths-StratumTowTallies.csv). Based on the survey design reports ((https://www.webapps.nwfsc.noaa.gov/assets/25/8655_02272017_093722_TechMemo136.pdf)) Im using 55-183, 183-549, 549-1280 for the combo, and 55-350, 350-500 for the triennial.
- [x] Are you ok with this change. I would expect that should we do a design based index, the stratas would need to match.
- [X] Ideally 366 would be used for the strata for the triennial, but 350 is included. I consider 350 to be appropriate, but would you like me to ask Curt to add 366 as an option? There are about 14 tows but no catch between 350 and 366 m.

iantaylor-NOAA commented 3 years ago

If possible, it's probably good to work up all age data as both CAAL and marginal ages so we can explore both options in the model (where CAAL data could have marginal data included with negative fleet numbers to exclude from the likelihood).
I suspect that choice of 400 was based on range of depths with observed Lingcod in WCGBTS. Running nwfscSurvey::PlotPresenceAbsence.fn(catch.WCGBTS) returns the figure below and the text "99.9% of positive hauls are shallower than 413." So under either stratification, the deepest stratum is essentially empty of Lingcod.

However, I don't really know what the best practices are for setting depth strata but I suspect that there's little difference in setting the break at 400 vs 549.

I think 350 is just fine as an approximation for 366 in the triennial strata.

chantelwetzel-noaa commented 3 years ago

Chiming in from the peanut gallery. The strata area is used to not only create the design-based index, but to expand all composition data from that strata which is why it can be important. Calculating a strata area from 183 -400 m vs. 183 - 549 m would result in quite distinct estimates I suspect. If you the is negligible lingcod biomass > 400 m, I would advocate cutting off the strata here rather than possibly over-expanding the observations in this strata if you selected a stratification larger than biomass range.

brianlangseth-NOAA commented 3 years ago

@chantelwtzel-noaa If we cut at X = 400 (or 549) would you recommend excluding the deep strata (X to 1280)? Im currently including out to 1280 as it was also done in the past

chantelwetzel-noaa commented 3 years ago

Based upon the figure @iantaylor-NOAA posted there does not be be positive tows of lingcod at depths greater than 425 (based on the last available depth bin in the plot). You only need to have stratas specified where your species of interest occurs, not the whole survey area. Any strata with 0 observations (or fewer than 3 I think) will throw an error when you try to calculate the design-based index and the comps.

iantaylor-NOAA commented 3 years ago

Thank you for chiming in, @chantelwetzel-noaa with your wisdom about how this should be done.

The cutoff in the figure is based on rounding the 99.9% quantile of observations by depth. In the case of Lingcod, there are a tiny number of observations beyond 400 meters, where the extra deep ones could theoretically be cases of Lingcod that got stuck in the net from the previous tow or some other random event:

catch.WCGBTS$Depth_m[catch.WCGBTS$Depth_m > 400 & catch.WCGBTS$total_catch_wt_kg > 0]
[1] 410.0 416.8 418.0 408.0 750.3 674.9

Maybe that's enough to avoid getting an error if there were a 400-1280 stratum. However, based on the discussion here, it sounds like it would be better to go with just 2 strata: 55-183, 183-400.

brianlangseth-NOAA commented 3 years ago

I updated the depth strata (400 for WCGBTS, and 350 for triennial - based on these being more or less the lower depth bounds for catching lingcod). I tested the differences with what I had previously for the WCGBTS in the north only (bins values can been seen above) and the differences are quite small. I standardized the values for each gender to get actual comps and took the difference between the two depth strata scenarios. Differences between any individual comp were no more than 0.01, and when divided by the actual comp, the largest effects are about a doubling. These however were on the edges of the distributions, and so were a doubling of very small values (e.g. 0.001 to 0.002).

kellijohnson-NOAA commented 3 years ago

@chantelwetzel-noaa are there recommendations for how to partition the coast into strata if you believe it is a single stock? For example, in our northern model should we keep the delineation for the WA-OR border that was previously used or would it be fine to just stratify the northern model using depth?

chantelwetzel-noaa commented 3 years ago

I was taught that stratification should be selected apriori based on expected or known changes in abundance by latitude or depth. If the expected density of a species in Oregon between certain depths is higher than what you would expect in Washington at those same depths you would want to apply a stratification (assuming there are enough observations in each strata). Way back in the day, I remember using a regression tree at a statistical approach to determine the best places to split the data for stratification. However, keep in mind this is a slightly "naughty" since we should technically not select the stratification by looking at the data but in reality I think we all look at the data to make these decisions.

The stratification of the survey data only matters for the length and age composition expansions (also the design-based index if used). Selecting a large strata area (by either latitude or depth) could wash out area based differences in length or age observations and conversely selecting area too small can over-expand limited observations. Selecting strata is about finding the right, 'goldilocks', size.

Keeping in mind that strata matter less or more dependent upon the species of interest. If you are expanding data for a well sampled ubiquitous species (e.g., Dover sole) the stratification would likely matter less than selecting stratas for a species with lower or very area specific observations.

kellijohnson-NOAA commented 3 years ago

Thanks @chantelwetzel-noaa. I have no a priori reason to keep the WA-OR strata and I was only doing it to preserve backwards compatibility. But we have basically broken backwards compatibility in every other way, so we might as well break it here too. I do agree with keeping the strata south of Point Conception. Good work @brianlangseth-NOAA!

brianlangseth-NOAA commented 3 years ago

@chantelwetzel-noaa and I discussed that latitudinal strata can affect output. Im impartial to one or the other but it will be important to keep assumptions for indices and comps consistent. Consequently, Im fine with keeping everything north of 40`10 as one latitudinal strata

brianlangseth-NOAA commented 3 years ago

I just updated the comps for the Combo and Triennial surveys to now be only for sexed (sex = 3) fish, combining unsexed into sex comps based on a sex ratio of 50:50 below age 1 (south) and age 2 (north) for age comps and 40 cm for length comps, and data-informed sex ratios for ages and lengths greater than those values. I also updated the triennial depth to stratify from 55-183, and 183 - 350. A strata at 183 was used in the last assessment, and is loosely based on looking at cpue by depth for the triennial survey, which has a bit of a break at 183 m and appears to be more variable afterward.

iantaylor-NOAA commented 3 years ago

Thank you @brianlangseth-NOAA!

Here's a related question. Stock Synthesis allows us to specify a setting for length and age composition data: combM+F: males and females treated as combined gender below this bin number.

This is intended to smooth out noise in the sex ratios associated with small fish for which sex determination is more difficult. This is a different purpose than the splitting up unsexed fish but the choice if used (I think it's not often used) can also be informed by the plots of observed sex ratio by age or length. While those plots (if you indeed used them) are fresh in your mind, would you suggest exploring the use of this option for the smallest individuals? That is, was there a lot of noise in the sex ratios for the small fish?

This is a low priority issue that can be explored very late in the modeling process.

kellijohnson-NOAA commented 3 years ago

@iantaylor-NOAA "WDFW H&L survey (Ian to contact Theresa Tsou)" from above. What is the status of this?

iantaylor-NOAA commented 3 years ago

I dropped to ball on that one. The 2017 assessment says "A WDFW hook and line survey includes 5-7 years of sampling but methods changed over time as this was a pilot study so these data are not used." I will email now.

brianlangseth-NOAA commented 3 years ago

@iantaylor-NOAA These figures are currently in the data > lenComps > WCGBTS > plots and the data > lenComps > Triennial > plots folders. It is very likely these are going to be moved into the figures folder in the near future.

As you can see, the sex ratios are highly variable, particularly for the WCGBTS - thus I would recommend the exploration as it seems very appropriate for our purpose. I do not know how much this will matter however, not having much experience with doing it myself. I will add as a sensitivity in #43

brianlangseth-NOAA commented 3 years ago

In the write up I noticed that the HOok and LIne survey samples in the Cowcod conservation areas. I did not check this, and unless the nwfscDiag package somehow excludes these samples automatically, I did not specificly separate. At this point, I dont think we can avoid a change in comps, but if the fits to the comps are odd, these may be an explanation.

iantaylor-NOAA commented 3 years ago

I'm fine with samples from the CCA. Those are part of the stock that's getting assessed. However, we could consider a block on selectivity associated with the year of entry to the CCA as a sensitivity. #43.

melissamonk-NOAA commented 3 years ago

@iantaylor-NOAA agreed. It's the same as CCFRP sampling in all of the MPAs.

On Sat, Jun 19, 2021 at 1:38 PM Ian Taylor @.***> wrote:

I'm fine with samples from the CCA. Those are part of the stock that's getting assessed. However, we could consider a block on selectivity associated with the year of entry to the CCA as a sensitivity. #43 https://github.com/iantaylor-NOAA/Lingcod_2021/issues/43.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/iantaylor-NOAA/Lingcod_2021/issues/21#issuecomment-864460786, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWRXM435RL452NCUYYKWWLTTT53PANCNFSM4V6H6DFQ .

-- Melissa Monk, Ph.D. (she/her) Fisheries Ecology Division National Marine Fisheries Service National Oceanographic and Atmospheric Administration 110 McAllister Way Santa Cruz, CA 95060

pfmc-assessments / lingcod

DATA 3 Survey Data #21