pfmc-assessments / PacFIN.Utilities

R code to manipulate data from the PacFIN database for assessments
http://pfmc-assessments.github.io/PacFIN.Utilities
Other
7 stars 1 forks source link

Clarify Oregon PacFIN BDS samples with SAMPLE_TYPE == "S" #112

Open iantaylor-NOAA opened 1 year ago

iantaylor-NOAA commented 1 year ago

@aliwhitman, here's another question for you. Sorry if this information is already spelled out somewhere and I missed it.

Could you clarify why there are lots of Oregon PacFIN BDS samples for Petrale with SAMPLE_TYPE == "S"?

@gertsevv and I noticed that there are years with no length data after processing through the PacFIN.Utilities::cleanPacFIN() function which I now see is due to application of the default filter which only retains for samples of type market (M) and exclude all samples of type research (R), special request (S), and commercial on-board (C) as documented here: https://github.com/pfmc-assessments/PacFIN.Utilities/blob/4683a3f12fc769f8b9ccb028c8dff594ddcf3cea/R/cleanPacFIN.R#L33-L40.

I get the idea that special request samples might be non-random or not representative of the population. However, all of these samples are associated with SAMPLE_METHOD == "R" (random) and they represent 44% of the petrale samples from Oregon, including 100% of the 37,348 samples from 1966-1986, another 4,468 samples from 1998-2007 (~30% of the total for that period), and another 43 samples scattered from other time periods. Two decades of sampling doesn't sound like a "special request" to me and it would be great to include these samples in the model, especially the ones from the early period, unless there's truly a good reason to exclude them.

Less than 4% of the Washington petrale samples and none of the California samples have SAMPLE_TYPE == "S".

chantelwetzel-noaa commented 1 year ago

I am interested to know the current status of these samples from @aliwhitman. These samples were identified in the 2019 update assessment. My memory is always hazy but I believe the reason they were excluded \is because the samples did not have an associated sample weight preventing expansion of these data via our typical methods.

iantaylor-NOAA commented 1 year ago

Thanks for chiming in @chantelwetzel-noaa.

More information is below on the presence/absence sample weights and fish weights. Yes they are missing in many years for the samples with SAMPLE_TYPE == "S", but not definitely not all. I may not be selecting the right variables, however. Calculations below are from the raw PacFIN extraction before cleaning (available to the NWFSC folks in \nwcfile\FRAM\Assessments\Assessment Data\2023 Assessment Cycle\petrale sole\PacFIN.PTRL.bds.08.May.2023.RData).

Even if all the sample weights were missing, I think there would be value in considering unexpanded length comps for those years.

r$> samples <- bds.pacfin %>% 
  dplyr::filter(AGENCY_CODE == "O" & SAMPLE_TYPE == "S") %>% 
  dplyr::select(SAMPLE_YEAR, EXPANDED_SAMPLE_WEIGHT) 

r$> table(is.na(samples$EXPANDED_SAMPLE_WEIGHT), samples$SAMPLE_YEAR)

        1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1977 1978 1979 1980
  FALSE    0    0    0    0    0    0    0    0    0    0 1882 1987 2438 2659
  TRUE  1744 2405 2635 2859 2977 1653 1522 1347 1120 1000  100    0    0    0

        1981 1982 1983 1984 1985 1986 1997 1998 1999 2000 2001 2002 2003 2004
  FALSE 4200 2208  413  201  600 1398   28  505  491  313  319  279  393  310
  TRUE     0    0    0    0    0    0    0    0    0  102    0    0    0    0

        2005 2006 2007 2015 2016 2021
  FALSE  808  723  225    6    3    6
  TRUE     0    0    0    0    0    0

r$> table(is.na(samples$FISH_WEIGHT), samples$SAMPLE_YEAR)

        1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1977 1978 1979 1980
  FALSE    0    0    0    0    0 1501 1422 1347 1120 1000    0  537    0    0
  TRUE  1744 2405 2635 2859 2977  152  100    0    0    0 1982 1450 2438 2659

        1981 1982 1983 1984 1985 1986 1997 1998 1999 2000 2001 2002 2003 2004
  FALSE    0    0    0    0    0  600    0    0    0    0    0    0    0    0
  TRUE  4200 2208  413  201  600  798   28  505  491  415  319  279  393  310

        2005 2006 2007 2015 2016 2021
  FALSE    0    0    0    6    3    6
  TRUE   808  723  225    0    0    0
aliwhitman commented 1 year ago

The vast majority of these samples are pre-1987, which have ALL been (after the fact) designated as SP samples (across the board, all species) because of a lack of documentation on how these samples were taken and processed. And yes, some are lacking a sample weight (good memory Chantel! I had to go back to old emails to confirm that).

My recommendation would be for you to consider the use of the SP samples, particularly those prior to 1987 as this was just a blanket approach taken a number of years ago by our data shop. Using the sample method (Random), you can weed out the ones that were part of our standard protocol (even if it wasn't well documented) and ones that were truly "special request". I think you can also consider including an unexpanded length comp version, as Ian suggested, but again, I would still probably recommend removing those without an R sampling method.

iantaylor-NOAA commented 1 year ago

Thanks @aliwhitman, this is very helpful. We will explore adding back the random samples from 1966-1986 and see what impact that has.

kellijohnson-NOAA commented 1 year ago

Thanks @chantelwetzel-noaa for your memory, @aliwhitman for the digging, and @iantaylor-NOAA for the summaries. I also want to note that some of these samples do not have entries in the FTID column for fish ticket ID. See note in the code here https://github.com/pfmc-assessments/PacFIN.Utilities/blob/ad3c0c029360591e051b4eed61c7f5ec07038240/R/cleanPacFIN.R#L232-L236 though I do not see where not having a FTID entry matters in the code downstream.

brianlangseth-NOAA commented 1 year ago

We have included special project samples prior to 1987 for canary - see this issue. I didn't check whether sample weight is there or not for the expansion even though we put them all through the expansion processing scripts.

kellijohnson-NOAA commented 1 year ago

@brianlangseth-NOAA did you really mean to close this issue? I think that maybe @iantaylor-NOAA should be the one to close it given that he opened it.