pfmc-assessments / nwfscSurvey

Tool to pull and process NWFSC West Coast groundfish survey data for use in PFMC groundfish stock assessments
http://pfmc-assessments.github.io/nwfscSurvey/
10 stars 8 forks source link

Species with multiple names in the data warehouse #37

Closed chantelwetzel-noaa closed 1 month ago

chantelwetzel-noaa commented 3 years ago

https://github.com/nwfsc-assess/nwfscSurvey/blob/39d7f4d6f1f926aba9fe3bd96c52f6d82bc5cea8/R/PullSpp.fn.R#L28

The process for finding species name in the package does not appear to be working for species that have had multiple names in the data warehouse. I was looking for vermilion rockfish which was the name used from 2003-2009 for the WCGBT and then changed to vermilion and sunset rockfish starting in 2010 and only the vermilion and sunset rockfish name was returned. I am unsure where this issue is arising from. The best fix would be to change the data warehouse to standardize names for all years, however, in the meantime we need to identify a way to improve the species name query so species with multiple naming structures are all returns. This issue may also apply to other species we assess (i.e., blue/deacon, gopher/black-and-yellow, rougheye/blackspotted) .

ericward-noaa commented 3 years ago

This is a parsing issue -- fixed here, https://github.com/nwfsc-assess/nwfscSurvey/commit/be1f09fba8040c72747887f14633e1c7b179a0dc

I also added some examples to the PullBio.fn and PullCatch.fn to illustrate how to do this with multiple species. Perhaps one more fix could be to catch errors if people don't paste names in right -- e.g. 'vermillion rockfish' instead of 'vermilion rockfish'

chantelwetzel-noaa commented 3 years ago

@ericward-noaa Thank you! I completely agree that creating a smarter check for species naming would be the next step here. It was my work to evaluate what something like this would look like is what led me to find the issue that you addressed here. However, I don't anticipate much time to work on this for a couple of months but would be happy to pick this up then.

kellijohnson-NOAA commented 3 years ago

Thanks @ericward-noaa for working on this. I found a discrepancy that I wasn't expecting for vermilion ... if I download all species and subset for vermilion I get one more record (in 2009) than if I just download Name = c("vermilion rockfish", "vermilion and sunset rockfish"). The trawl_id is 200903008123 and it has two entries for vermilion, one with 18 fish and one with one fish. @chantelwetzel-noaa or @Curt-Whitmire-NOAA do you know why this is happening?

chantelwetzel-noaa commented 3 years ago

@kellijohnson-NOAA Thank you for reporting this issue. I have been unable to replicate this issue. In both an older data pull and a new one using the grouped species option, I am only getting 18 records for that specific Trawl_id. Can you please send me your data file and the code used to pull directly?

ericward-noaa commented 3 years ago

Hmm I can look into this @kellijohnson-NOAA. I'd guess the issue is a mis-spelling or mis-labelling in the database somewhere (e.g. in either the Bio or Catch datasets, one of the names is off in 1 location). I think the table @curt-whitmire-noaa is putting together would be a good fix -- that could be used as data in the package. It might be in making the table, we could figure out if the issue is obvious -- are there common names with multiple species names, or vice versa?

ericward-noaa commented 3 years ago

@kellijohnson-NOAA I also can't replicate this -- but the issue may be mis-labeling. If I pull just those 2 species, I get 564 'vermilion rockfish' and 1985 'vermilion and sunset rockfish'

bio_dat <- PullBio.fn(Name = c("vermilion rockfish","vermilion and sunset rockfish"), SurveyName = "NWFSC.Combo")

And then when I pull all the data

bio_dat_all <- PullBio.fn(Name = c(SurveyName = "NWFSC.Combo"))

I get 564 'Sebastes miniatus' and 1985 'Sebastes sp. (miniatus / crocotulus)'. In this latter pull, there's also a single record that is labeled as 'Sebastes sp. (crocotulus)'. And I'm guessing this is the issue.

The specific record is: Project Trawl_id Scientific_name Year Vessel Pass Tow Date NWFSC.Combo 200903008123 Sebastes sp. (crocotulus) 2009 Excalibur 2 123 2009-Oct-07

If you want to get them to match up, I think the fix is to add "Sunset rockfish" to the PullBio.fn call,

bio_dat <- PullBio.fn(Name = c("Sunset rockfish", "vermilion rockfish", "vermilion and sunset rockfish"), SurveyName = "NWFSC.Combo")

kellijohnson-NOAA commented 3 years ago

@ericward-noaa Sorry that I wasn't very specific, but the results that I was referring to were from PullCatch.fn() not PullBio.fn().

ericward-noaa commented 3 years ago

Gotcha @kellijohnson-NOAA . I took a look, and think I fixed it. I think you still need to add "Sunset rockfish" to the list to be complete, as in the PullBio.fn, but I also commented out these lines: https://github.com/nwfsc-assess/nwfscSurvey/blob/96d3028c8128a90b7cffbb0b5c2697608fdf1a22/R/PullCatch.fn.R#L251

I think this is just a legacy thing that doesn't actually affect the results, but would be good to have you double check. I updated the sunset/vermilion example in this function too