iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.odis.org/
29 stars 17 forks source link

Connect HUB Ocean API to ODIS #310

Open jmckenna opened 1 year ago

jmckenna commented 1 year ago

cc @TaraOceanData @thomafred

thomafred commented 1 year ago

@jmckenna We are talking to the OGC about defining a data-sync protocol: https://github.com/C4IROcean/ocean-open-data-sync-protocol

From our talks with @gra-moore you may already be considering this data-sync protocol?

gra-moore commented 1 year ago

Some clarifications, the https://github.com/C4IROcean/ocean-open-data-sync-protocol is a protocol I discussed with Rob regarding data synchronisation of data in a dataset and how this could be done as OGC pattern that can be applied to many different OGC types.

This protocol Universal Data API (https://open.mimiro.io/specifications/uda/latest.html) is a more generalised semantic data exchange protocol that I presented to the DITTO group and related to this issue https://github.com/iodepo/odis-arch/issues/162. I think I also discussed this with @jmckenna and @pbuttigieg while at HUB Ocean.

For the exchange of semantic data this RSS for data has been presented before to w3c many years ago as the SDShare Note / working group. The UDA update is a refinement after 10+ years of real life deployments.

pbuttigieg commented 1 year ago

@thomafred the OGC sync makes sense, but I'm not sure I see the bearing on the JSON-LD/schema.org interface to ODIS.

Would you like to embed OGC properties or types into your JSON files ?

gra-moore commented 1 year ago

@thomafred @pbuttigieg The thought was that in could be useful for publishers of ODIS dataset / data descriptions to be able to share and syndicate these as change feeds so that aggregators and others could easily collect and aggregate these descriptions in a standardised way. The Universal Data API (https://open.mimiro.io/specifications/uda/latest.html) I mentioned above is one propsed way of doing this.

pbuttigieg commented 1 year ago

Ah right, the RSS-style approach to cataloguing, I recall .

@fils is this viable for all ODIS partners or do you think it will cause a divide relative to the standard sitemap to json files ? Could the sitemap have urls that are API calls to reteieve json files instead ?

fils commented 1 year ago

@pbuttigieg

There is nothing hard about implementing the Universal Data API (UDA) in Gleaner that I can see from quick inspection of the documentation.
The API looks clean and easy to implement from a client point of view. We don't implement JSON Web Tokens(JWT) in Gleaner but could.

Note, this is a statement about how Gleaner (https://github.com/gleanerio) would implement this. Gleaner is not ODIS, so aligning to ODIS is more a policy than technology issue.

Allow me to play devil's advocate a bit:

Most ODIS/OIH partners will not be willing to implement this. This is not a technological slight, rather just that many of the groups we deal would either not have the resources or the incentive to do so. That is not really an issue, just an observation. They might be able to export a file like this, but not the APIs. Some might be able to implement it, but would need a winning value proposition to do so.

If they can manage the data pipeline to populate a UDA document it would likely be easier for them to maintain a a proper sitemap.xml with lastmod date. UDA is obviously more capable than sitemaps of course, but the lift to maintaining a valid lastmod current sitemap.xml file is lower.

I am also curious how this compares to things like GeoAPI and LDES along with the mentioned Ocean Data Exchange and SDShare.
Again, implementing many of this from a technical point of view is not overly hard, but it does begin to grow the complexity.

OIH has very much focused on more classic web architecture like in the W3C Data on the Web Best Practices. Moving into API space does allow us to address more complex partner interactions like data synchronization that is less efficient, typically, in that space.

It would be interesting to think about what a "broker" from somethink like UDA to sitemap.xml might look like. In the DeCODER project I am working with we are exploring this for the STAC catalogs. The Radiant Earth group has STAC Browser (https://github.com/radiantearth/stac-browser) which expresses STAC catalogs using the schema.org vocabulary. However, they don't build out sitemap.xml as an SEO enhancement. However, it something they are looking to for the milestone 3.2 work (ref: https://github.com/radiantearth/stac-browser/milestone/11).

It might be interesting to explore what an API for the UDA might look like that exposed sitemap. So an API like:

GET /datasets/sitemap 

that could be exposed in a robots.txt with something like

User-agent: EarthCube_DataBot/1.0
Allow: /
Sitemap: https://example.org/datasets/sitemap

would be interesting. Given the other existing APIs, this seems like it would be relatively easy since most of the real work is already being done by those. That endpoint response could be a valid sitemap.xml document with the /datasets/people/entities entries.

Such an approach would allow clients to

1) use the classic web architecture approach as an entry level method.
2) then explore more capable approaches in the UDA API for those clients wishing to do so.

The common patterns section at https://open.mimiro.io/specifications/uda/latest.html#common-patterns seems to indicate it would be easy to either schema.org directly or at least modify the context to express mappings for the types we are looking for in ODIS. On my skimming of the docs, I don't fully follow any constraints the UDA places on vocabularies, types and such things.

OK, just my 2 cents to help move the conversation along.

References

Data on the web best practices

https://www.w3.org/TR/dwbp/

Universal Data API

https://open.mimiro.io/specifications/uda/latest.html

Ocean Data Exchange API

https://github.com/C4IROcean/ocean-open-data-sync-protocol/blob/master/specification.md

SDShare Protocol

http://www.sdshare.org/

GeoAPI

https://www.geoapi.org

STAC

https://stacspec.org/en https://github.com/radiantearth/stac-browser

Sitemap.xml

https://sitemaps.org/

LDES

https://semiceu.github.io/LinkedDataEventStreams/

tomredd commented 1 year ago

Bit late to the party but just wanted to jump in. I'm glad you played the devils advocate @fils and I completely agree that this is not a technical issue. But I'm wondering if there might be an opportunity to run a quick technical pilot with a few ODIS/OIH partners to see how this would work in practice. Ideally, I'd like to include some of the partners from the Iliad project as well or maybe in parallel.

I think the Iliad partners I have in mind might need a helping hand technically, but if we demonstrate how straightforward it is to implement, I think we might get a lot of traction.

Thoughts?

jmckenna commented 1 year ago

It might be interesting to explore what an API for the UDA might look like that exposed sitemap. So an API like:

GET /datasets/sitemap 

that could be exposed in a robots.txt with something like

User-agent: EarthCube_DataBot/1.0
Allow: /
Sitemap: https://example.org/datasets/sitemap

would be interesting. Given the other existing APIs, this seems like it would be relatively easy since most of the real work is already being done by those. That endpoint response could be a valid sitemap.xml document with the /datasets/people/entities entries.

+1 for this approach, a "broker" to go from STAC/OGC API, etc. into a /sitemap endpoint (this could become quite popular)

jmckenna commented 11 months ago

Latest updates since our last meeting on 2023-08-18:

pbuttigieg commented 10 months ago

GIven our most recent call, we may be able to go the sitemap --> embedded metadata in the ODP pages now

pbuttigieg commented 10 months ago

@jmckenna

In the distribution stanza, we can add links to the raw data (we saw that it was an application/octet-stream in the ODP preview, as well as any netCDF (application/x-netcdf) conversions that HubOcean do

We should also add a variableMeasured stanza, where we can list that - for example - raw acoustic profiles were measured (it doesn't matter if there's no Level 2 and upwards data there)

jmckenna commented 10 months ago

@pbuttigieg , all: see related JSON-LD comments in https://github.com/iodepo/odis-arch/issues/368

jmckenna commented 10 months ago

@pbuttigieg I added variableMeasured and "encodingFormat": "application/octet-stream" into master (see https://github.com/iodepo/odis-arch/blob/master/book/thematics/dataset/graphs/krillMetadata.json )

smrgeoinfo commented 10 months ago

https://github.com/iodepo/odis-arch/blob/master/book/thematics/dataset/graphs/krillMetadata.json 404 not found

?see https://github.com/iodepo/odis-arch/blob/176-Hub-Ocean-krill-metadata/book/thematics/dataset/graphs/krillMetadata.json

jmckenna commented 10 months ago

@smrgeoinfo sorry for the timing, some major repository restructuring happened in the last 24 hours here, will grab the new location.....

jmckenna commented 10 months ago

@smrgeoinfo here is the permanent location of the JSON-LD template: krillMetadata.json

MatthewWhaley commented 4 months ago

@jmckenna ref my last email, here are the updates on our proof of concept sitemap and dataset examples:

Our ODIS catalogue registration https://catalogue.odis.org/view/3299 Our proof-of-concept sitemap https://oih-sitemap.azurewebsites.net/ Two sample datasets in the sitemap: https://oih-sitemap.azurewebsites.net/dataset?name=AkerBioMarineEK60EK80EchosounderAKBMdata https://oih-sitemap.azurewebsites.net/dataset?name=PGSbiotadata_mammal_turtle_observations_raw_files

So hopefully we are close to testing the integration.

I have some queries on the JSON-LD specification that we supplied, so we might want to make some edits based on your feedback. Most of it I am happy with but the key queries are: --Correct entry for "@id" : I didn't find a good example for the content we should have here, so just put our platform identifier. Should it be the JSON-LD url location? --"identifier" section. We do not have doi's for these particular datasets (something we need to look into) so only have our platform specific identifiers. This could the UUID (e.g. "a22c1e17-b00e-43f3-91f1-9aefedf58ec0") our "qualified name" (e.g. "1e3401d4-9630-40cd-a9cf-d875cb310449-akbm-raw-ds"), the catalog url that takes to our landing page (behind login) e.g. "https://app.hubocean.earth/catalog/dataset/1e3401d4-9630-40cd-a9cf-d875cb310449-akbm-raw-ds" --I think I remember you saying at some point it is not a good idea to have a direct link to the catalog urls, better to have the base url and then identifiers separate, but wasn't sure if this is necessary (since they are behind a login) or how I would implement this if preferable. --whether to render the JSONLD as XML or HTML (e.g. with wrapper) as I have seen examples of both.

MatthewWhaley commented 4 months ago

@jmckenna et al. Thanks for the meeting today. Just to update: we implemented the 'must-fix' issues like id, propertyID etc. We will look into adding the distribution and conditionsOfAccess blocks in the near future, but there should be no barriers to test whenever you like before that also.