ianbrunjes / HABsDataPublish

Workflow for ingesting, transforming, and publishing Scripp's Harmful Algal Bloom (HAB) data from ERDDAP to remote repositories.
3 stars 0 forks source link

HABs data remote repository publishing

This repository pulls HABs data from ERDDAP and transforms it to Darwin Core Archive format for it to be published to remote repositories like EDI, GBIF, and OBIS. There is a scheduled task using GitHub Actions to pull the updated datasets and then publish them- currently into EDI, with plans to integrate GBIF into the process in the future.

Currently, this script supports the data from the following sites: Cal Poly Pier, Monterey Wharf, Newport Beach Pier, Santa Cruz Wharf, Santa Monica Pier, Scripps Pier, and Stearns Wharf

Overview

A scheduled job controlled by GitHub Actions will run the following process on the 1st of each month at midnight.

Generating EML

EML is built by reading a base template, HABS_base_EML.xml, which describes the more static properties like the dataTables and their attributes, and then reads in a series of .csv files for properties that are more configurable/subject to change.

The following files can be updated to change properties of the EML metadata:

Publishing to EDI

The publishing step into EDI requires a GitHub environment to be set up called habspublish.

The environment should contain the following secrets:

The csv package_identifiers.csv needs to contain the static package identifier granted by EDI (or elsewise) for the targeted environment.

About

Organization: Southern California Coastal Ocean Observing System

Author: Ian Brunjes, ianbrunjes@ucsb.edu