pfmc-assessments / PacFIN.Utilities

R code to manipulate data from the PacFIN database for assessments
http://pfmc-assessments.github.io/PacFIN.Utilities
Other
7 stars 1 forks source link

INPFC area is no longer available in the bds data #45

Open kellijohnson-NOAA opened 3 years ago

kellijohnson-NOAA commented 3 years ago

Find a different way to determine if samples are from Canada b/c INPFC area codes are no longer available in the bds data.

chantelwetzel-noaa commented 3 years ago

To make matters even more confusing and annoying - there are some records in the new bds data that do have an INPFC_AREA for each state but there are a large number of NAs. In my new bds data I have 103632 records that are NA and 188746 with an INPFC_AREA. Looking across years, all years prior to 1987 are NA and records from 1987-2020 have a mix of INPFC_AREA designated or listed as NA. It is unclear to me as to when the INPFC_AREA is not being migrated into the new tables.

kellijohnson-NOAA commented 3 years ago

Sorry @chantelwetzel-noaa, I was using hacky sql code to try and get INPFC_AREA from other tables. So that was just experimental by me. Sorry, that I didn't label it as such. I will remove it.

When looking at Puget Sound samples that we want to remove from the full data set for all species, does this typically include removing samples from the entire Strait of Juan de Fuca? WDFW labels everything caught from Cape Flattery and eastward as PSMFC area 4a. We could use this to flag samples.

chantelwetzel-noaa commented 3 years ago

The current approach that I am applying which definitely is not perfect but works reasonably well for Dover sole (this may not be true for other species) is to filter out records that are not included in this vector of PSMFC records:

keep_PSMFC = c("1A", "1B", "1C", "2A", "2B", "2C", "2E", "2F", "3A", "3B", "3C")

Comparing this vector to the areas with Dover sole records, I can see that the following areas are removed: 3D, 3S, 4A, 5A, and 5B. A complication in this approach is that in the new bds tables there are records from both California and Washington that have ONLY the state specific area code (e.g. 1038, 59A1, 60A1) and not an PSMFC code. Filtering via the PSMFC would remove these records where presumably all of the California records would be within federal waters and a subset of the Washington records would be. I found this map from WDFW on page 27 that shows their state specific codes. Based on that I am removing some records from Washington that should be included, however, the number of records for Dover sole is relatively small. I am unsure what type of approach would work well across all species given the lack of INPFC areas and the inconsistency in the PSMFC area codes.

kellijohnson-NOAA commented 3 years ago

I am thinking the approach of removing known bads rather than including only known goods might be best?

kellijohnson-NOAA commented 3 years ago

Matching on FTID did NOT work. The column FTID in the fish ticket database does not align with FTID in bds data. One time it matched me to salmon catch. I emailed Brad.

kellijohnson-NOAA commented 3 years ago

Note that MO is Monterey according to INPFC_ARID in the bds_sample table. So ignoring MO from now on as a bad area.

John-R-Wallace-NOAA commented 3 years ago

Referring back to Issue #32, I can work on large re-code table I mentioned. Any thoughts/comments on that approach?