iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.odis.org/
29 stars 17 forks source link

connect MIMS catalogue as ODIS Node #250

Open jmckenna opened 1 year ago

jmckenna commented 1 year ago
marksparkza commented 11 months ago

An initial implementation is now complete and scheduled to go live on 8 Nov.

The sitemap.xml index file will be located at: https://data.ocean.gov.za/mims/catalog/sitemap.xml

Following is an example of a JSON-LD record that will be embedded in the record detail view:

{
    "@context": "https://schema.org/",
    "@type": "Dataset",
    "url": "https://data.ocean.gov.za/mims/catalog/10.15493/DEA.MIMS.01232023",
    "name": "Processed underway Thermosalinograph (TSG) observations from the Integrated Ecosystem Programme: Southern Benguela (IEP:SB) on the Algoa Voyage 279, February 2022",
    "identifier": "doi:10.15493/DEA.MIMS.01232023",
    "license": "https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode",
    "keywords": [
        "Algoa",
        "Algoa 279",
        "SOUTH ATLANTIC OCEAN",
        "THERMOSALINOGRAPH",
        "TSG",
        "physical oceanography"
    ],
    "description": "Here we present the 6-second resolution processed Thermosalinograph (TSG) data collected during the Integrated Ecosystem Programme: Southern Benguela (IEP:SB) cruise on the Algoa Voyage 279 between 04 February and 12 February 2022. A SeaBird SBE45 Thermosalinograph (TSG) is used to opportunistically collect underway near-surface temperature and conductivity measurements during research and monitoring cruises. Water is continuously pumped to the TSG from an intake located in the hull of the vessel, and the observations are continuously interfaced with navigational information. A temperature sensor close to the intake provides temperature measurements of the incoming water (T1). The temperature of the water inside the conductivity cell (T2) is used to accurately compute salinity (S) from the conductivity measurements (C). The IEP:SB in 2013 consolidated a long-term, multi-decadal time-series (from 1951 onward) of information for this important region and has continued monitoring in the form of the IEP:SB. The programme is a multi-disciplinary, collaborative and capacity building platform undertaking relevant science, including updating technology, with the aim to develop ecosystem indicators that can be used to effectively monitor and understand the Southern Benguela. These include physical, chemical, planktonic, microbial, seabird, marine mammal, benthic and pollution (plastic) ecosystem indicators as required by ecosystem-based management regarding the following priorities: ocean warming, ocean acidification, trophic functioning, pollution and water quality. It is on-going monitoring programme."
}
jmckenna commented 11 months ago

@marksparkza looks good, thanks for this update. Comments:

thanks.

marksparkza commented 11 months ago

@jmckenna Thanks for the feedback and additional info. I'll add in @id and spatialCoverage. Would it be beneficial to also include temporalCoverage?

Should @id be included instead of, or in addition to, url - seeing as they would have the same value?

jmckenna commented 11 months ago

@marksparkza here are some more comments:

marksparkza commented 11 months ago

@jmckenna I've included @id in tomorrow's update. spatialCoverage and temporalCoverage will be added in the near future. I'm not sure if we can include distribution. There are some terms-of-use considerations which I will need to discuss with our data curation team. I'll let you know once I have a verdict on this.

jmckenna commented 11 months ago

@marksparkza ok thanks, will re-harvest tomorrow or Thursday. (on our side, spatialCoverage is very important, as then we can discover your records through a spatial search)

marksparkza commented 10 months ago

@jmckenna The changes are now live :tada: Please let me know if you encounter any issues when harvesting.

Noted re spatialCoverage. I just wanted to take a bit more time to assess how best to implement this. All our spatial extents are represented as bounding boxes, so at first glance box would be the obvious choice. However the schema.org definition of box is rather vague, so polygon might be the better choice. Either way I want to make sure our implementation is consistent with ODIS and Google expectations.

marksparkza commented 10 months ago

The spatialCoverage example above (from OIH documentation) gives points in lon-lat order.

However, Google says that points must be in lat-lon order.

Science on Schema also says that points must be in lat-lon order.

There is an open issue on schema.org for the ordering of lats and lons in points.

marksparkza commented 10 months ago

On closer inspection, I think the OIH polygon example might be invalid.

Here is schema.org's description of GeoShape, which does suggest lat-lon ordering:

The geographic shape of a place. A GeoShape can be described using several properties whose values are based on latitude/longitude pairs. Either whitespace or commas can be used to separate latitude and longitude; whitespace should be used when writing a list of several such points.

In the OIH example, if the commas are taken to be lat-lon separators as described for GeoShape, and the spaces are taken to be point delimiters, then the first and last terms (142.014 and 10.161667) are invalid. According to Google and Science on Schema, this polygon (if those terms are dropped/ignored) describes a diagonal line between 10N,142E and 18N,148E.

jmckenna commented 10 months ago

@marksparkza Yes I am very familiar with all of the links and discussion that you are referring to, as I have had to examine and explain this issue to so many OIH partners. Here is some clarification:

It may not answer all of your questions, but these are the guidelines (that now works for both ODIS and Google Dataset Search) that I share with partners (even though, as you said, the schema.org documentation is not clear).

marksparkza commented 10 months ago

@jmckenna Thanks for clarifying the box format. This probably is the better option for us as it maps directly from the bounding boxes in our metadata records.

I've created a PR https://github.com/iodepo/odis-arch/pull/364 to address the incorrect formatting of polygon in the OIH examples.

marksparkza commented 10 months ago

@jmckenna Feedback on distribution: our curators suggested that users should rather be redirected back to our catalogue than provided with direct download links.

spatialCoverage and temporalCoverage will be included in the next update on Wednesday.

marksparkza commented 5 months ago

@jmckenna I'm following up to inquire about the status of connecting the MIMS catalogue as an ODIS node. Our MIMS datasets don't seem to be available as yet on OIH, so I was wondering whether anything is still needed from our side in terms of the sitemap or JSON-LD that we are publishing?

jmckenna commented 3 months ago

@marksparkza did some more testing on your endpoint inside ODIS, here is some feedback:

mims-spatial-records

marksparkza commented 1 week ago

@jmckenna Thanks very much for the update and resolving the ODISCat entry with Bubele.

Can you please advise what the reason is for ~500 datasets not being indexed in ODIS as type Dataset? All the MIMS JSON-LD records specify type Dataset and are structured the same way.