uchicago-dsi / cbd-ocean-acidification

Automates retrieval and submission of ocean acidification data for the Center for Biological Diversity
GNU General Public License v3.0
1 stars 0 forks source link

Automated Ocean Acidification Data Pipeline

Automates retrieval and submission of ocean acidification data for the Center for Biological Diversity.

Currently contains scripts to

Initial Setup

These steps must be taken before using the program. Most will only be required once.

Software Setup

This setup should only have to be run once per machine you run it on.

  1. Install Docker. The project is designed to run in a Docker container. Therefore, the only prerequisite is Docker: Get Docker

  2. Clone the repository. If you haven't already: git clone https://github.com/11th-Hour-Data-Science/cbd-ocean-acidification.git

  3. Change to the root project directory: cd cbd-ocean-acidification

  4. Build the Docker image: docker build --tag cbd .

Record Keeping Setup

Some steps in the 303(d) data submission process must be taken manually once before completion. The metadata/stations.csv contains metadata about the stations that are available to retrieve data from. You can open this in Excel or any csv editor of your choice. Just be sure to save any changes as a csv.

Please refer to "Stations Table Schema" below for information on how to fill out columns.

Washington

Washington uses EIM to handle environmental data. It has three distinct types of data.

To set up submitting data to Washington:

  1. Follow the instructions here through creating the relevant studies
  2. Update stations.csv to add the eim_study_id to the relevant stations
  3. Ensure all stations you wish to submit have the required information in stations.csv and station_parameter_metadata.csv

For more information, visit the EIM page

California

CEDEN is California's primary portal for environmental data upload, but it does not accept time series data as of this writing. All time-series data must be submitted to the Integrated Report Document Upload Portal. To submit to California:

  1. Create an IR Portal account: https://public2.waterboards.ca.gov/IRPORTAL/Account/Register
  2. Ensure all stations you wish to submit have the required information in stations.csv and station_parameter_metadata.csv

Hawaii

Stations

For new stations and locations:

  1. Contact the data source to receive approval, ensure we are following their terms of service, and to ensure they are not already submitting data.
  2. If the station's data is available through one of the available collectors: King County, IPACOA, ERDDAP, find its ID in this service and use it as our station_id
  3. If it is not available through one of the available collectors, a new scraper will need to be created. If you are able you can try to write one yourself (following existing patterns) and open a Pull Request. Otherwise open an issue describing the new station you would like and where you retrieved its data from.
  4. Add an entry to stations.csv with all relevant data. You may need to contact the source.
  5. Add entries to station_parameter_metadata.csv with all relevant data. You may need to contact the source.

Usage

NERRS

If you are submitting NERRS data:

  1. Get your ipv4 address (going to a website like https://whatismyipaddress.com/ should do it)

  2. Request a webservices account from NERRS: http://cdmo.baruch.sc.edu/web-services-request/

  3. Wait for your confirmation email. Since most IP addresses change over time, you may have to do this before each time you acquire NERRS data, or get a static IP.

  4. In the project root directory, run:

    bash run_tool.sh <STATE> <start_date> <end_date>

    where should be the state name (California, Hawaii, or Washington), and should be dates YYYY/MM/DD format (exclude <>). This will prompt your password, as we are using sudo priveleges to change the owner of the output files from root (since docker created them) to the current user.

  5. Results will be saved in results/STATE/YYYY-MM-DDTHH-MM with a README.txt file explaining further instructions.

Without Docker (discouraged)

  1. In the project root directory, run:
    python main.py <STATE> --start <start_date> --end <end_date>

    where should be the state name (California, Hawaii, or Washington), and should be dates YYYY/MM/DD format (exclude <>).

  2. Results will be saved in results/STATE/YYYY-MM-DDTHH-MM with a README.txt file explaining further instructions.

Directory Structure

pipeline/metadata/

Stations Table Schema

Contact

You can contact us by opening an issue or emailing tspread at uchicago edu

Project Link: [https://github.com/chicago-cdac/cbd-ocean-acidification]()