Closed brianlangseth-NOAA closed 3 years ago
@kellijohnson-NOAA I updated your file with two columns. The first column is the number of non-NA ages I get when pulling from the data warehouse, based on
bio.WCGBTS <- nwfscSurvey::PullBio.fn(Name = utils_name("common"), SurveyName = "NWFSC.Combo") table(bio.WCGBTS$Year,is.na(bio.WCGBTS$Age))
The second column is the difference. It is highlighted in yellow
Thus, we have the new data in however we are missing small numbers of samples in nearly every year. Perhaps those have non standard length, weight, or age, in which case the PullBio.fn would exclude them? That or something weird is going on.
Yes, very weird. Here is a raw download from the api where the sample numbers match much more closely. Can you look at the differences?
FYI - this is the URL generated from nwfscSurvey::PullBio.fn, which indicates some filtering is being done
https://www.webapps.nwfsc.noaa.gov/data/api/v1/source/trawl.individual_fact/selection.json?filters=project=Groundfish%20Slope%20and%20Shelf%20Combination%20Survey,station_invalid=0,performance=Satisfactory,depth_ftm>=30,depth_ftm<=700,field_identified_taxonomy_dim$common_name|=[%22lingcod%22],year>=1980,year<=5000&variables=project,trawl_id,common_name,year,vessel,pass,tow,datetime_utc_iso,depth_m,weight_kg,ageing_laboratory_dim$laboratory,length_cm,width_cm,sex,age_years,otosag_id,latitude_dd,longitude_dd,standard_survey_age_indicator,standard_survey_length_or_width_indicator,standard_survey_weight_indicator,operation_dim$legacy_performance_code
I went in via browser()
and looked at the full data pull without the filters and here is a summary which matches what you have within a tolerance of 2 ages for a given year
All of the filters seem reasonable to me
So, I think we are good!
There were two main differences I could find. The first is that the api file includes samples with project name "Groundfish Triennial Shelf Survey".
The other is the field "target_station_design_dim.stn_invalid_for_trawl_date_whid" which, if there is a non-zero value, are not included in the pull from the warehouse. I dont know what this field means though. Moreover, there are 213 (of the 510 data points in the api that aren't in the warehouse pull) where this field is zero so its not a home run.
Of these 510 records, 253 have ages. Doesn't match with the difference between what patrick is providing though.
I see your updated comment so Im fine moving forward with what is available from the data warehouse if you are.
The Triennial data were included just because of the year range that I selected, so that is fine.
I am guessing that the target_station_design... maps to station_invalid
and @chantelwetzel-noaa is already filtering for that.
Patrick warned that a few years would be off by one or two and he didn't seem concerned about it, so yeah we are good to go. I will probably make a issue in the nwfscSurvey package that we should potentially provide a summary of what is removed by year or at least return the UrlText to the user so they can access the full data set if they want. Adding an attribute to the data of url
would work so that a data frame can still be returned rather than a list.
I added more about the filtering to https://github.com/nwfsc-assess/nwfscSurvey/issues/45, but am fine with the filtering as it stands. We have many many samples for lingcod and not that many got filtered. For future assessments, if we're keeping these filters as they stand, we should have the age readers avoid reading any of those samples.
@iantaylor-NOAA and @kellijohnson-NOAA A few questions (as checkboxes) regarding the survey data.
If possible, it's probably good to work up all age data as both CAAL and marginal ages so we can explore both options in the model (where CAAL data could have marginal data included with negative fleet numbers to exclude from the likelihood).
I suspect that choice of 400 was based on range of depths with observed Lingcod in WCGBTS.
Running nwfscSurvey::PlotPresenceAbsence.fn(catch.WCGBTS)
returns the figure below and the text "99.9% of positive hauls are shallower than 413." So under either stratification, the deepest stratum is essentially empty of Lingcod.
However, I don't really know what the best practices are for setting depth strata but I suspect that there's little difference in setting the break at 400 vs 549.
I think 350 is just fine as an approximation for 366 in the triennial strata.
Chiming in from the peanut gallery. The strata area is used to not only create the design-based index, but to expand all composition data from that strata which is why it can be important. Calculating a strata area from 183 -400 m vs. 183 - 549 m would result in quite distinct estimates I suspect. If you the is negligible lingcod biomass > 400 m, I would advocate cutting off the strata here rather than possibly over-expanding the observations in this strata if you selected a stratification larger than biomass range.
@chantelwtzel-noaa If we cut at X = 400 (or 549) would you recommend excluding the deep strata (X to 1280)? Im currently including out to 1280 as it was also done in the past
Based upon the figure @iantaylor-NOAA posted there does not be be positive tows of lingcod at depths greater than 425 (based on the last available depth bin in the plot). You only need to have stratas specified where your species of interest occurs, not the whole survey area. Any strata with 0 observations (or fewer than 3 I think) will throw an error when you try to calculate the design-based index and the comps.
Thank you for chiming in, @chantelwetzel-noaa with your wisdom about how this should be done.
The cutoff in the figure is based on rounding the 99.9% quantile of observations by depth. In the case of Lingcod, there are a tiny number of observations beyond 400 meters, where the extra deep ones could theoretically be cases of Lingcod that got stuck in the net from the previous tow or some other random event:
catch.WCGBTS$Depth_m[catch.WCGBTS$Depth_m > 400 & catch.WCGBTS$total_catch_wt_kg > 0]
[1] 410.0 416.8 418.0 408.0 750.3 674.9
Maybe that's enough to avoid getting an error if there were a 400-1280 stratum. However, based on the discussion here, it sounds like it would be better to go with just 2 strata: 55-183, 183-400.
I updated the depth strata (400 for WCGBTS, and 350 for triennial - based on these being more or less the lower depth bounds for catching lingcod). I tested the differences with what I had previously for the WCGBTS in the north only (bins values can been seen above) and the differences are quite small. I standardized the values for each gender to get actual comps and took the difference between the two depth strata scenarios. Differences between any individual comp were no more than 0.01, and when divided by the actual comp, the largest effects are about a doubling. These however were on the edges of the distributions, and so were a doubling of very small values (e.g. 0.001 to 0.002).
@chantelwetzel-noaa are there recommendations for how to partition the coast into strata if you believe it is a single stock? For example, in our northern model should we keep the delineation for the WA-OR border that was previously used or would it be fine to just stratify the northern model using depth?
I was taught that stratification should be selected apriori based on expected or known changes in abundance by latitude or depth. If the expected density of a species in Oregon between certain depths is higher than what you would expect in Washington at those same depths you would want to apply a stratification (assuming there are enough observations in each strata). Way back in the day, I remember using a regression tree at a statistical approach to determine the best places to split the data for stratification. However, keep in mind this is a slightly "naughty" since we should technically not select the stratification by looking at the data but in reality I think we all look at the data to make these decisions.
The stratification of the survey data only matters for the length and age composition expansions (also the design-based index if used). Selecting a large strata area (by either latitude or depth) could wash out area based differences in length or age observations and conversely selecting area too small can over-expand limited observations. Selecting strata is about finding the right, 'goldilocks', size.
Keeping in mind that strata matter less or more dependent upon the species of interest. If you are expanding data for a well sampled ubiquitous species (e.g., Dover sole) the stratification would likely matter less than selecting stratas for a species with lower or very area specific observations.
Thanks @chantelwetzel-noaa. I have no a priori reason to keep the WA-OR strata and I was only doing it to preserve backwards compatibility. But we have basically broken backwards compatibility in every other way, so we might as well break it here too. I do agree with keeping the strata south of Point Conception. Good work @brianlangseth-NOAA!
@chantelwetzel-noaa and I discussed that latitudinal strata can affect output. Im impartial to one or the other but it will be important to keep assumptions for indices and comps consistent. Consequently, Im fine with keeping everything north of 40`10 as one latitudinal strata
I just updated the comps for the Combo and Triennial surveys to now be only for sexed (sex = 3) fish, combining unsexed into sex comps based on a sex ratio of 50:50 below age 1 (south) and age 2 (north) for age comps and 40 cm for length comps, and data-informed sex ratios for ages and lengths greater than those values. I also updated the triennial depth to stratify from 55-183, and 183 - 350. A strata at 183 was used in the last assessment, and is loosely based on looking at cpue by depth for the triennial survey, which has a bit of a break at 183 m and appears to be more variable afterward.
Thank you @brianlangseth-NOAA!
Here's a related question. Stock Synthesis allows us to specify a setting for length and age composition data: combM+F: males and females treated as combined gender below this bin number
.
This is intended to smooth out noise in the sex ratios associated with small fish for which sex determination is more difficult. This is a different purpose than the splitting up unsexed fish but the choice if used (I think it's not often used) can also be informed by the plots of observed sex ratio by age or length. While those plots (if you indeed used them) are fresh in your mind, would you suggest exploring the use of this option for the smallest individuals? That is, was there a lot of noise in the sex ratios for the small fish?
This is a low priority issue that can be explored very late in the modeling process.
@iantaylor-NOAA "WDFW H&L survey (Ian to contact Theresa Tsou)" from above. What is the status of this?
I dropped to ball on that one. The 2017 assessment says "A WDFW hook and line survey includes 5-7 years of sampling but methods changed over time as this was a pilot study so these data are not used." I will email now.
@iantaylor-NOAA These figures are currently in the data > lenComps > WCGBTS > plots and the data > lenComps > Triennial > plots folders. It is very likely these are going to be moved into the figures folder in the near future.
As you can see, the sex ratios are highly variable, particularly for the WCGBTS - thus I would recommend the exploration as it seems very appropriate for our purpose. I do not know how much this will matter however, not having much experience with doing it myself. I will add as a sensitivity in #43
In the write up I noticed that the HOok and LIne survey samples in the Cowcod conservation areas. I did not check this, and unless the nwfscDiag package somehow excludes these samples automatically, I did not specificly separate. At this point, I dont think we can avoid a change in comps, but if the fits to the comps are odd, these may be an explanation.
I'm fine with samples from the CCA. Those are part of the stock that's getting assessed. However, we could consider a block on selectivity associated with the year of entry to the CCA as a sensitivity. #43.
@iantaylor-NOAA agreed. It's the same as CCFRP sampling in all of the MPAs.
On Sat, Jun 19, 2021 at 1:38 PM Ian Taylor @.***> wrote:
I'm fine with samples from the CCA. Those are part of the stock that's getting assessed. However, we could consider a block on selectivity associated with the year of entry to the CCA as a sensitivity. #43 https://github.com/iantaylor-NOAA/Lingcod_2021/issues/43.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/iantaylor-NOAA/Lingcod_2021/issues/21#issuecomment-864460786, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWRXM435RL452NCUYYKWWLTTT53PANCNFSM4V6H6DFQ .
-- Melissa Monk, Ph.D. (she/her) Fisheries Ecology Division National Marine Fisheries Service National Oceanographic and Atmospheric Administration 110 McAllister Way Santa Cruz, CA 95060
Issues listed in the data spreadsheet for fishery independent surveys
Surveys on the table include:
Maybe:
Not on the table: