pfmc-assessments / nwfscSurvey

Tool to pull and process NWFSC West Coast groundfish survey data for use in PFMC groundfish stock assessments
http://pfmc-assessments.github.io/nwfscSurvey/
10 stars 8 forks source link

PullSpp.fn() takes too long #30

Open kellijohnson-NOAA opened 3 years ago

kellijohnson-NOAA commented 3 years ago

PullSpp.fn() extracts data for all species from a single year of the WCGBTS, which takes forever. This function acts to supply a lookup data frame for common name to scientific name for species caught within the surveys performed or used by the NWFSC. See its development here in the following pull request: 28#. Via email request, the survey team is trying to make the original lookup table within the warehouse accessible to all users, i.e., downloadable from a web link. This issue is to remind us that the code in URLtext in line 19 will need to be changed for efficiency purposes.

kellijohnson-NOAA commented 3 years ago

@Curt-Whitmire-NOAA do you know if there is a way to access a list of species names with their common and scientific name using sql or url. Currently, I download all data and find unique values, this is very costly with respect to time and I am looking for a simple way to perhaps filter upon download rather than after download or a url for an existing table. Thanks.

Curt-Whitmire-NOAA commented 3 years ago

@kellijohnson-NOAA, I can certainly provide a table that could be uploaded to Github. Do you only want unique fish names, or do you also want invertebrates? This would be a short term fix, and wouldn't update dynamically. For that I can work with our developer to add the taxonomy table to the DW front end. I will likely have some follow-up questions to make this table as useful as possible.

kellijohnson-NOAA commented 3 years ago

I think that the current function will work until we can get a dynamic solution going forward.

In short, I am looking for a way to link common names to scientific names and the reverse where it accounts for historical names and species complexes.

Curt-Whitmire-NOAA commented 3 years ago

@kellijohnson-NOAA , I will email you a CSV with the current list of "fish" in the Data Warehouse taxonomy dimensions table. Please review and let me know if it suits your needs for now. We can then discuss a better dynamic solution.

kellijohnson-NOAA commented 1 year ago

@Curt-Whitmire-NOAA any progress on getting the taxonomy dimensions table available as a pull rather than how we have it now with a saved csv that is NEVER updated?

Curt-Whitmire-NOAA commented 1 year ago

@kellijohnson-NOAA I found that the taxonomy dimension table is already exposed via the API. There are some fields that need to be fully populated (e.g., ITIS Serial #, WoRMS AphiaID) but as far as I can tell the table includes the full list of taxonomic names in for all our FRAM programs.

Curt-Whitmire-NOAA commented 1 year ago

@kellijohnson-NOAA here's an example API pull for all "fish" species:

https://www.webapps.nwfsc.noaa.gov/data/api/v1/source/warehouse.taxonomy_dim/selection.json?filters=species_category=fish

Curt-Whitmire-NOAA commented 1 year ago

@kellijohnson-NOAA Potential filters (arguments) to consider for including in the PullSpp.fn() are:

  1. species_category
  2. species_subcategory
  3. grp_reg_depth_category

If there's anything else you need for this enhancement, please let me know.

kellijohnson-NOAA commented 1 year ago

Thank you @Curt-Whitmire-NOAA the link you gave me worked flawlessly using

species <- get_json("https://www.webapps.nwfsc.noaa.gov/data/api/v1/source/warehouse.taxonomy_dim/selection.json?filters=species_category=fish")
Curt-Whitmire-NOAA commented 1 year ago

@kellijohnson-NOAA glad it meets your needs! Now we should both stop working for the night ;^)