pfmc-assessments / nwfscSurvey

Tool to pull and process NWFSC West Coast groundfish survey data for use in PFMC groundfish stock assessments
http://pfmc-assessments.github.io/nwfscSurvey/
10 stars 8 forks source link

increase breadth of available data #36

Open kellijohnson-NOAA opened 3 years ago

kellijohnson-NOAA commented 3 years ago

Requests are coming in for increased breadth of data to be available for download within the package. Good job team for creating a useable package! For example, hook and line ages would be a great data source to prioritize being available for download. Currently, hook and line ages are not in a user-friendly state to be available for easy queries, which has prohibited its past inclusion. Perhaps a collaboration with the data team here could increase the ease of download. @chantelwetzel-noaa can you list the data types / sources that are "available" within the warehouse but not yet available from nwfscSurvey?

chantelwetzel-noaa commented 3 years ago

Absolutely! I have had multiple conversations with John Harms about getting the Hook & Line data into the data warehouse in a quarriable fashion and he is on board but with the data team staffing in flux, movement on this topic has stalled for the moment. There are two main issues regarding the Hook & Line data as it currently exists in the data warehouse:

1) The project name for this survey, for some reason, takes a completely different form compared to our other surveys. The hook & line survey data has a unique project name for each year: Shelf Rockfish 2015, Shelf Rockfish 2016, Shelf Rockfish 2017,... If we were a bit more clever on how we query the data warehouse by project we could find a way to retrieve all years data, however, it would be preferred that the data team updates the project name to not include the year but also a change in project name that would clarify what survey these data are from (e.g. hook and line).

2) The second issue is a bigger hurdle. The unique sampling nature of the hook and line survey has a number of fields (i.e, hook number, angler hook angler, swell height) that are not a standard field across the other surveys in the data warehouse. These fields are used in our current approach for creating indices. It is my understanding that these unique fields are not available in the data warehouse. Please correct me if this is not entirely true. However, if we were able to identify a way to pull the data given the year specific survey name we could pull the standard catch and biological data that could facilitate user exploration of the data.

Last summer, I had started work on creating a way to query all years data but set this work aside after discussion with John Harms who had concern that some of the unique key quantities would not be available to users when pulling from the data warehouse alone.

iantaylor-NOAA commented 3 years ago

Thanks for working on this @chantelwetzel-noaa.

I see no reason why the data warehouse can't be modified to include all the fields that are of interest for any survey. It's not like we would leave area swept out of the WCGBTS extraction because it doesn't occur in the H&L survey, so it doesn't make sense to me to leave things out of the H&L extractions either.

I would vote to avoid spending time on a work-around and instead wait for the data team to find the time to make the changes for both these issues at their end.