pfmc-assessments / PacFIN.Utilities

R code to manipulate data from the PacFIN database for assessments
http://pfmc-assessments.github.io/PacFIN.Utilities
Other
7 stars 1 forks source link

cleanPacFIN SAMPLE_TYPE = "C" #42

Closed chantelwetzel-noaa closed 3 years ago

chantelwetzel-noaa commented 3 years ago

I am looking at an older PacFIN data pull and I am seeing records from Washington state where the SAMPLE_TYPE is listed as "C" commercial on-board samples. The cleanPacFIN function would remove all of these records if CLEAN = TRUE. I am not entirely clear how commercial on-board samples differ from market (M) samples but I think we should double check whether we want to remove these samples or not.

mhaltuch commented 3 years ago

I think that these are special projects that may not be random samples. Best to double check with Teresa on this front.

m

On Wed, Jan 13, 2021 at 3:47 PM Chantel Wetzel notifications@github.com wrote:

I am looking at an older PacFIN data pull and I am seeing records from Washington state where the SAMPLE_TYPE is listed as "C" commercial on-board samples. The cleanPacFIN function would remove all of these records if CLEAN = TRUE. I am not entirely clear how commercial on-board samples differ from market (M) samples but I think we should double check whether we want to remove these samples or not.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nwfsc-assess/PacFIN.Utilities/issues/42, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZOPK74NG75CUTZT4F2G7LSZYWF3ANCNFSM4WBS6F5Q .

-- Cheers, Melissa

chantelwetzel-noaa commented 3 years ago

Agreed. I have sent an email to Theresa to understand this type of sample better. Thanks.

kellijohnson-NOAA commented 3 years ago

There are recent samples with SAMPLE_TYPE = 'C', for example there are longline and other pot samples for sablefish from 2017. It says SAMPLE_METHOD_CODE = 'R' for Random. There are samples with SAMPLE_TYPE = 'C' for Arrowtooth flounder (1982), big skate (2007), black hagfish (2012), black rockfish (1990), brown rockfish (80's and 90's), canary rockfish (70s and 2000s), copper rockfish (1982, 1983, 1988, 1989, 1991), dover sole (1970, 1982, 1983), english sole (lots of years), flathead sole (1969, 1970), lingcod (70s and 80s), longnose skate (2004), p. cod (lots of years), pacific hagfish (2012), POP (1978), pacific sanddab, pacific tomcod, whiting, petrale sole (1969, 1979), quillback rockfish (1979, 1982, 1983, 1989, 1991), rex sole, rougheye rockfish (2012), and sablefish (lots of years), sand sole, shortspine thornyhead (1982), spiny dogfish, starry flounder, walleye pollock, widow (2004), yelloweye, yellowtail. I got tired of typing all of the years for each species.

chantelwetzel-noaa commented 3 years ago

@kellijohnson-NOAA Thank for investigating the prevalence of this sample type. Given that it seems to occur across multiple years and species we should confirm how these samples should be treated. Can you double check which states have this sample type? I have emailed Theresa Tsou at WDFW for additional information on this sample type but if it occurs in Oregon and California as well we should double check with them as well.

kellijohnson-NOAA commented 3 years ago

Good call, I checked and they are Washington specific. Washington has C, M, S, and NULL SAMPLE_TYPE. C = Commercial on-board M = market sample S = special request

chantelwetzel-noaa commented 3 years ago

Perfect. I will update the issue when I hear back from Theresa.

chantelwetzel-noaa commented 3 years ago

I heard back from Theresa, her response was:

"Commercial-on-board samples were from the WCGOP or whatever observer program before WCGOP. They should be random samples."

I was definitely confused by this answer for a few reasons. Why would WCGOP samples be uploaded to PacFIN by Washington state? There are a fair number of records prior to 2002, what programs are these records from? WGCOP is typically measuring discarded fish, are these discard lengths? I sent a follow up to Theresa asking these questions and she was uncertain about the answers and said we should follow up with the WCGOP team. I spoke with Kaleigh Somers from the observer team and she was surprised that PacFIN would have WCGOP lengths. We should continue to investigate these data. Until we have a clear understanding of these records, I would propose not using them in assessments. I will update this issue if and when I learn more.

kellijohnson-NOAA commented 3 years ago

Would it be helpful if I downloaded some of the more recent observations and then Kaleigh could check them to see if they are the same as the WCGOP data that way we would know if they are being "double counted"?

chantelwetzel-noaa commented 3 years ago

I am not sure. I also checked with Jason Jannot and Jon McVeigh and they were not entirely sure what they were either. There is a protocol to send some sablefish samples made at sea to both WA and OR. However, since these samples are arising across species that does not explain these data. Neither of them knew what the data prior to 2002 are coming from since WGCOP was not running before then. Additionally, WCGOP typically only collects discarded fish lengths so we would not want to be using any WCGOP data in PacFIN since the data in PacFIN should consist of retained fish which we are using to estimate selectivity. I suspect the source of the sample type "C" records is far more complicated than being from WCGOP.

Digging into the single species bds pull I have with these records, I am seeing that the INPFC_AREA code from areas that we typically do not retain (CS, HC, PS, SJ, SS). I wonder if we checked for records with this sample type if they are occurring in non-US INPFC areas (or the Puget Sound) which could explain why they are showing up for Washington only.

andi-stephens-NOAA commented 3 years ago

I was told those were for research, but that was a long time ago. Either Melissa or Allan told me to skip those.

On Thu, Jan 14, 2021 at 3:16 PM Chantel Wetzel notifications@github.com wrote:

I am not sure. I also checked with Jason Jannot and Jon McVeigh and they were not entirely sure what they were either. There is a protocol to send some sablefish samples made at sea to both WA and OR. However, since these samples are arising across species that does not explain these data. Neither of them knew what the data prior to 2002 are coming from since WGCOP was not running before then. Additionally, WCGOP typically only collects discarded fish lengths so we would not want to be using any WCGOP data in PacFIN since the data in PacFIN should consist of retained fish which we are using to estimate selectivity. I suspect the source of the sample type "C" records is far more complicated than being from WCGOP.

Digging into the single species bds pull I have with these records, I am seeing that the INPFC_AREA code from areas that we typically do not retain (CS, HC, PS, SJ, SS). I wonder if we checked for records with this sample type if they are occurring in non-US INPFC areas (or the Puget Sound) which could explain why they are showing up for Washington only.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nwfsc-assess/PacFIN.Utilities/issues/42#issuecomment-760538132, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTUYBX246BQSZLW64W76F3SZ53NTANCNFSM4WBS6F5Q .

kellijohnson-NOAA commented 3 years ago

I just checked PacFIN and there are both new and old samples with sample_type = "C". Only from Washington. They are all listed as random samples. They come from areas that we are keeping, so we need to sort this out. How would we know if the samples are included in the discard data as well?

chantelwetzel-noaa commented 3 years ago

I reached out to both Jason Jannot and Jon McVeigh with the observer team and they did not know what these samples could be. The WCGOP samples DO NOT get entered into PacFIN, additionally there are years where this SAMPLE_TYPE == "C" that are well outside the years of the WCGOP program 2002 - 2020. Jon suggest maybe contacting Maggie Somers at ODFW to see if she my have historical knowledge regarding these samples. However, if they always occurs in WA, she may not.

andi-stephens-NOAA commented 3 years ago

You could try Theresa Tsou?

On Wed, Feb 17, 2021 at 11:30 AM Chantel Wetzel notifications@github.com wrote:

I reached out to both Jason Jannot and Jon McVeigh with the observer team and they did not know what these samples could be. The WCGOP samples DO NOT get entered into PacFIN, additionally there are years where this SAMPLE_TYPE == "C" that are well outside the years of the WCGOP program 2002 - 2020. Jon suggest maybe contacting Maggie Somers at ODFW to see if she my have historical knowledge regarding these samples. However, if they always occurs in WA, she may not.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nwfsc-assess/PacFIN.Utilities/issues/42#issuecomment-780797071, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTUYBSQZNGAKVACLYJ3E7DS7QKNVANCNFSM4WBS6F5Q .

chantelwetzel-noaa commented 3 years ago

I did and her response is given a few above in this chain. Her comment was the one that sent us down the weird road of them being WCGOP samples which they don't seem to actually be.

andi-stephens-NOAA commented 3 years ago

Ah, sorry, didn't see that.

On Wed, Feb 17, 2021 at 1:41 PM Chantel Wetzel notifications@github.com wrote:

I did and her response is given a few above in this chain. Her comment was the one that sent us down the weird road of them being WCGOP samples which they don't seem to actually be.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nwfsc-assess/PacFIN.Utilities/issues/42#issuecomment-780872607, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTUYBXO6M3HIM3HQFGATN3S7QZWXANCNFSM4WBS6F5Q .

chantelwetzel-noaa commented 3 years ago

@shcaba had a conversation with Theresa from WDFW. She confirmed that these Commercial Onboard samples that are unique to Washington and actually should not be used because these same lengths could be entered as Market samples at the dock. Retaining them would result in possible double counting of fish, so the current protocol of removing them from the PacFIN data appears correct.

kellijohnson-NOAA commented 3 years ago

Fantastic! Thank you @chantelwetzel-noaa, @shcaba, and Vlada (who also talked to Theresa today) about it. I will keep them excluded as the default!