opengeospatial / OGCAPI-EDR-Sprint2

This Github repository is for the second OGC API - EDR code sprint focusing on the OGC API - Environmental Data Retrieval candidate standard.
1 stars 6 forks source link

Expose IOOS Data through the EDR-API (via proxy to thredds server) #6

Closed ShaneMill1 closed 3 years ago

ShaneMill1 commented 4 years ago

An interesting usecase was brought up by @glennlaughlin and the desire to expose data available at:

https://ioos.noaa.gov/data/access-ioos-data/

as edr-api endpoints.

@glennlaughlin, feel free to add additional details as you see fit. Discussed in the go-to meeting, we determined that IOOS has thredds server endpoints.

@m-burgoyne suggested that as a starting point, we can identify desired data through the thredds server and write a proxy to edr-api endpoints.

It was determined that a good sampling geometry type to start with would be "polygon".

Once the data is made available through edr-api endpoints, it is possible that we could create a client that uses the endpoint, calculates and displays "thresholds" as added value.

glennlaughlin commented 3 years ago

agreed. we should be able to use the IOOS data portal to validate the results when ready. I'm not sure how the backend for IOOS is organized so do we need to decide which regional center(s) to pull from for experimentation? eg. NERACOOS data only? I think the use case(s) we decided on was

chris-little commented 3 years ago

@ethanrd can you help with this? @m-burgoyne has a learning curve in trying to understand and get access to the particular Buoy IOOS data in THREDDS. @solson-nws @ShaneMill1

m-burgoyne commented 3 years ago

URL for the server: http://www.neracoos.org/thredds/UMO_historical_realtime_agg.html

ethanrd commented 3 years ago

@chris-little - Sure, I can try to help.

I'm not familiar with the data but looks like each buoy has a set of datasets (met, waves, currents, CDT 1m, CDT 20m, etc.). Some buoy's have more datasets than others. Most of the datasets look like timeseries data though the "Current Profiles" dataset looks like a timeseries of depth profiles. The datasets are structured as gridded data with single valued latitude and longitude coordinates (the location of the buoy, I guess) rather than using CF Discrete Sampling Geometries structures. It also looks like the data are only available through OPeNDAP DAP2.

@m-burgoyne - I'm not sure where to go from here. Let me know what other information would be helpful. Maybe a quick chat when the sprint starts back up?

m-burgoyne commented 3 years ago

@ethanrd Thanks for taking a look, you have confirmed my interpretation of the data and its available access methods

ShaneMill1 commented 3 years ago

@m-burgoyne @ethanrd Wondering if we can use the opendap with xarray http://xarray.pydata.org/en/stable/io.html#opendap

Maybe we need an ingest process where we read the data with opendap and convert it to zarr for an edr-api implementation to reach to?

I feel that maybe using the opendap plugin for xarray at the point of using a query may cause slow performance, but I do not know that for sure.

m-burgoyne commented 3 years ago

@ShaneMill1 Whilst converting and restructuring the data would make the queries more performant, this is not gridded data; it is a set of spot observations which are appended to on a regular basis (with archive going back years).

ShaneMill1 commented 3 years ago

I'm not familiar with the data but looks like each buoy has a set of datasets (met, waves, currents, CDT 1m, CDT 20m, etc.). Some buoy's have more datasets than others. Most of the datasets look like timeseries data though the "Current Profiles" dataset looks like a timeseries of depth profiles. The datasets are structured as gridded data with single valued latitude and longitude coordinates (the location of the buoy, I guess) rather than using CF Discrete Sampling Geometries structures. It also looks like the data are only available through OPeNDAP DAP2.

@m-burgoyne

From @ethanrd it looks like the actual datasets are gridded with specific lat/lon's showing data. So I am assuming that the remainder of the grids points give missing values where a buoy is not present. I could be reading @ethanrd's comment wrong though

ethanrd commented 3 years ago

@ShaneMill1 - Each grid contains just a single lat/lon point. So the shape of each grid looks something like this: air_temperature[time=720000][depth=1][lat=1][lon=1]. Also, the time dimensions are different across buoys and between datasets for the same buoy. Which means they can't easily be combined into a single array.

ShaneMill1 commented 3 years ago

@ethanrd ahh I see, thank you for the clarification!