ioos / ckanext-ioos-theme

IOOS Catalog as a CKAN extension
GNU Affero General Public License v3.0
7 stars 14 forks source link

ERDDAP dataset attributes to read for CKAN filtering, API, and Schema.org output #208

Closed mwengren closed 3 years ago

mwengren commented 4 years ago

@benjwadams Per meeting this morning, here's a list of important global dataset attributes from the IOOS Metadata Profile 1.2 to parse for use in CKAN (filtering, API access, and encoding in Schema.org JSON LD).

Some of these may already be added to the ISO XML by ERDDAP (and ncISO/THREDDS). Didn't do a crossreference making this list. We may have repeated information if so, but we can ignore that for now.

For Sensor Map/NDBC ingest (only the global attributes, not variable-level equivalents):

Other attributes (all global)

Each of these global attributes can just be stored as an individual 'extra' value in the CKAN database, and therefore should be exposed via CKAN API package_search or other endpoints, PacIOOS results for example:

https://data.ioos.us/api/3/action/package_search?q=organization:pacioos%20and%20res_format:ERDDAP-TableDAP%20and%20cf_standard_names:sea_water_turbidity%20and%20gcmd_keywords:%22EARTH%20SCIENCE%20%3E%20OCEANS%20%3E%20OCEAN%20CHEMISTRY%20%3E%20OXYGEN%22&start=0

Bobfrat commented 4 years ago

@mwengren does that mean IOOS Metadata Profile v1.2 is official?

mwengren commented 4 years ago

@Bobfrat we're still working out the details of the QARTOD and GTS ingest attributes, but otherwise, yes, everything else is finalized. 'infoUrl' is something we changed recently to match ERDDAP's existing infoUrl attribute.

If you want to see the current working version of the 1.2 profile, it is in my own fork here: https://mwengren.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html.

These are the still pending sections:

Hopefully, we'll end up using the attributes in those first two tables (building off of existing CF ones for quality control via 'status_flag'), but we're going to reach out the CF group first to make sure it's in line. Essentially we're adding 'flag_method' to go alongside 'flag_values' and 'flag_meanings'. 'flag_method' will have a vocabulary of QARTOD test names.

Unfortunately not quite ready to add to CC, but close. Please compare against Glider DAC rules when you can.

cc @jessicaaustin @kwilcox

mwengren commented 4 years ago

@benjwadams This one has been idling for awhile, but we should pick it back up again.

As a reminder, the purpose here is to query an ERDDAP dataset for particular attributes (global only, I believe) as part of the harvest process with the purpose of adding the values as 'extras' inside CKAN's package.extras table (thereby exposing them as part of the API for clients).

The main use case to keep in mind is:

The only relevant change to the IOOS Metadata Profile since we first created this is the addition of the ioos_ingest attribute, which I added to the list at the top of the issue.

mwengren commented 4 years ago

@benjwadams

Let's discuss picking this idea back up again at our next meeting. I updated the first entry in this issue with the appropriate list of IOOS attributes to read.

We're going to try to integrate Catalog with the harvesting workflow envisioned for Sensor Map and ERDDAP, and reading these attributes directly from ERDDAP as part of the harvest workflow will be necessary.

Related to issue #227 in that we'll need to properly identify datasets with ERDDAP endpoints.

mwengren commented 3 years ago

I've tested this using both the Catalog UI and CKAN API and filtering on custom IOOS attributes works great!

For example, to filter by both gts_ingest=true and ioos_ingest=true:

Catalog UI: https://data.ioos.us/dataset?q=gts_ingest%3Atrue+ioos_ingest%3Atrue

CKAN API: https://data.ioos.us/api/3/action/package_search?fq=gts_ingest:true%20ioos_ingest:true&start=0

The actual attribute values for each dataset are all available in the CKAN API JSON results under the extras field.

Also, the IOOS Metadata Profile 1.2 custom attributes tab at the bottom of the dataset detail page looks quite nice as well. Thanks @benjwadams!

Pinging @jessicaaustin as we should discuss having Axiom test using the CKAN API to filter for ERDDAP datasets to ingest into Sensor Map. We may need to re-address guidance for RAs on ioos_ingest default value, however, as part of that.

Closing the issue as functionality is implemented, but we can continue discussion here if necessary.

mwengren commented 3 years ago

Also, note that you can issue 'not equal to' queries as well, so the example above can be adapted to find cases where gts_ingest = true and ioos_ingest != false, which matches our Profile 1.2 guidance for any datasets RAs would like to have included into both the GTS and IOOS products:

UI: https://data.ioos.us/dataset?q=gts_ingest%3Atrue+-ioos_ingest%3Afalse

CKAN API: https://data.ioos.us/api/3/action/package_search?fq=gts_ingest:true%20-ioos_ingest:false&start=0

937 datasets at present.

mwengren commented 3 years ago

Noting another query possibility: datasets with gts_ingest=true, but are lacking a global wmo_platform_code value per the IOOS Metadata Profile 1.2 guidance:

UI: https://data.ioos.us/dataset?q=gts_ingest%3Atrue+-wmo_platform_code%3A[*+TO+*]

CKAN API: https://data.ioos.us/api/3/action/package_search?fq=gts_ingest%3Atrue+-wmo_platform_code%3A[*+TO+*]