ioos / marine_life_data_network

Planning efforts relevant to IOOS Marine Life and IOOS DMAC.
Other
0 stars 2 forks source link

New Task: Connect ATN and MBON into IOOS DMAC #52

Open MathewBiddle opened 2 months ago

MathewBiddle commented 2 months ago

Who is requesting this?

@ioos/marine-life

What is being requested?

Connect ATN and MBON into IOOS DMAC. Coordinate with IOOS Catalog developers (POC: @mwengren) on how ATN and/or MBON portals could be harvested for data.ioos.us. Guidance for the process to add records is documented at https://ioos.github.io/catalog/

What is the requested deadline and why?

No response

What is the current status quo (i.e., what happens if this does not get done)?

ATN and MBON datasets wont show up in data.ioos.us. Marine Life will not meeting IOOS DMAC requirements by being discoverable in data.ioos.us.

What indicates this is done (i.e., how do we know this is complete)?

Provide a description or any other important information.

xref:

mwengren commented 1 month ago

Copying my comments from https://github.com/ioos/ckanext-ioos-theme/issues/237#issuecomment-2150427124 below:

AFAIK IOOS is required to furnish ISO XML metadata (or perhaps DCAT JSON, not 100% sure on that alternative) to NOAA for inclusion in NOAA's enterprise data inventories for all of our publicly-available data/services.

For all of IOOS' non-bio data, it's been fairly straightforward to do this as most of the software we use has been developed to able to output an ISO XML metadata representation of the datasets they serve. Since that isn't the case for OBIS, MBON, or ATN (I believe), that's something we'll need to address for both including those data in IOOS Catalog given its current capabilities, and also for sending up the chain to NOAA to meet requirements.

It may be that leveraging IOOS Catalog and converting the various bio data formats to ISO XML format isn't the best approach to meeting NOAA data inventory requirements. If there are better, simpler ways to furnish these metadata to NOAA that I'm not aware, we should consider those options. Catalog has been our solution to date, but primarily because of the pre-existing metadata format support and compatibility.

Ideally, we can have a comprehensive inventory of 100% of IOOS' data in Catalog, and I think we should still aim for that goal, but we need to understand better what the challenges for that might be wrt ATN, MBON, or other bio/Marine Life data.

MathewBiddle commented 3 weeks ago

Thanks @mwengren.

For ATN, at some point, we hope to add non-embargoed data to an ATN ERDDAP which could be an easy pathway for that observing method. See https://github.com/ioos/marine_life_data_network/issues/44

For MBON, we are encouraging the MBON projects to work with RAs to host the raw data on an RA ERDDAP (or other web service as applicable). Most of the RA ERDDAPs are already being harvested, hence the push for that collaboration. Below is an example:

Another wrinkle in the whole pipeline is that OBIS-USA is being archived at NCEI on a quarterly basis. Part of our guidance is to submit data to OBIS-USA. While that metadata record is not available through the IOOS Catalog, it is available through the various NOAA and higher Catalogs. So, does that meet our NOAA data inventory requirements?? See links below:

The data flow diagram might help illustrate all the nuances https://ioos.github.io/mbon-docs/mbon-data-flow.html

mwengren commented 3 weeks ago

@MathewBiddle That makes sense on the data flow and connection in with the RA ERDDAPs, I recall that plan now... thanks for adding the example.

I think the OBIS-USA/NCEI archive probably does meet the NOAA data publishing/open data requirements for those data - at least from what I understand.

I think our goal should be to include both access points (NCEI archive and RA ERDDAP) at the NOAA Catalog level (i.e. OneStop). The IOOS Catalog should include all data access services provided by the RAs, or other IOOS DACs, that are funded and supported by IOOS.

Having two separate metadata records for the same dataset should be OK as well as they'll be describing different endpoints to access the same data, presumably. Ideally there would be a way to relate each metadata record to the other within the NOAA Catalog, but I'm not sure that is technically possible at present. That might be a good requirement to share with the OneStop team though.

I guess the one scenario that seems to be a potential gap where IOOS-funded bio data might not be represented in IOOS Catalog is if a provider is not serving their data via RA ERDDAP, but are aligning them to Darwin Core and submitting to NCEI.

Ideally, we could also represent those raw data access points, whatever they might be, in IOOS Catalog as well, even if they would be technically meeting the NOAA open data publishing guidelines via OBIS/NCEI archive pathway.

I don't know how much of a priority or how common this is... maybe would provide justification to encourage those providers to work with an RA to publish to ERDDAP, however.

laurabrenskelle commented 3 weeks ago

@mwengren Is there a reason you couldn't share the RA ERDDAP link as another data access link in the collection metadata record at NCEI? It doesn't seem ideal to have two collection records for the same dataset in OneStop. Here is an example: https://data.noaa.gov/onestop/collections/details/573b7dc1-7d06-4fdc-a134-056c112c2260

MathewBiddle commented 3 weeks ago

I guess the one scenario that seems to be a potential gap where IOOS-funded bio data might not be represented in IOOS Catalog is if a provider is not serving their data via RA ERDDAP, but are aligning them to Darwin Core and submitting to NCEI.

I think this might be more common with cross funded efforts, like MBON. Some projects use EDI and Arctic Data Center as their repositories (maybe BCO-DMO too).