Open jmckenna opened 1 year ago
@jmckenna We are talking to the OGC about defining a data-sync protocol: https://github.com/C4IROcean/ocean-open-data-sync-protocol
From our talks with @gra-moore you may already be considering this data-sync protocol?
Some clarifications, the https://github.com/C4IROcean/ocean-open-data-sync-protocol is a protocol I discussed with Rob regarding data synchronisation of data in a dataset and how this could be done as OGC pattern that can be applied to many different OGC types.
This protocol Universal Data API (https://open.mimiro.io/specifications/uda/latest.html) is a more generalised semantic data exchange protocol that I presented to the DITTO group and related to this issue https://github.com/iodepo/odis-arch/issues/162. I think I also discussed this with @jmckenna and @pbuttigieg while at HUB Ocean.
For the exchange of semantic data this RSS for data has been presented before to w3c many years ago as the SDShare Note / working group. The UDA update is a refinement after 10+ years of real life deployments.
@thomafred the OGC sync makes sense, but I'm not sure I see the bearing on the JSON-LD/schema.org interface to ODIS.
Would you like to embed OGC properties or types into your JSON files ?
@thomafred @pbuttigieg The thought was that in could be useful for publishers of ODIS dataset / data descriptions to be able to share and syndicate these as change feeds so that aggregators and others could easily collect and aggregate these descriptions in a standardised way. The Universal Data API (https://open.mimiro.io/specifications/uda/latest.html) I mentioned above is one propsed way of doing this.
Ah right, the RSS-style approach to cataloguing, I recall .
@fils is this viable for all ODIS partners or do you think it will cause a divide relative to the standard sitemap to json files ? Could the sitemap have urls that are API calls to reteieve json files instead ?
@pbuttigieg
There is nothing hard about implementing the Universal Data API (UDA)
in Gleaner that I can see from quick inspection of the documentation.
The API looks clean and easy to implement from a client
point of view. We don't implement JSON Web Tokens(JWT) in Gleaner but could.
Note, this is a statement about how Gleaner (https://github.com/gleanerio) would implement this. Gleaner is not ODIS, so aligning to ODIS is more a policy than technology issue.
Allow me to play devil's advocate a bit:
Most ODIS/OIH partners will not be willing to implement this. This is not a technological slight, rather just that many of the groups we deal would either not have the resources or the incentive to do so. That is not really an issue, just an observation. They might be able to export a file like this, but not the APIs. Some might be able to implement it, but would need a winning value proposition to do so.
If they can manage the data pipeline to populate a UDA document it would likely be easier for them to maintain a a proper sitemap.xml with lastmod date. UDA is obviously more capable than sitemaps of course, but the lift to maintaining a valid lastmod current sitemap.xml file is lower.
I am also curious how this compares to things like GeoAPI
and LDES along with the mentioned Ocean Data Exchange and SDShare.
Again, implementing many of this from a technical point of
view is not overly hard, but it does begin to grow the complexity.
OIH has very much focused on more classic web architecture like in the W3C Data on the Web Best Practices. Moving into API space does allow us to address more complex partner interactions like data synchronization that is less efficient, typically, in that space.
It would be interesting to think about what a "broker" from somethink like UDA to sitemap.xml might look like. In the DeCODER project I am working with we are exploring this for the STAC catalogs. The Radiant Earth group has STAC Browser (https://github.com/radiantearth/stac-browser) which expresses STAC catalogs using the schema.org vocabulary. However, they don't build out sitemap.xml as an SEO enhancement. However, it something they are looking to for the milestone 3.2 work (ref: https://github.com/radiantearth/stac-browser/milestone/11).
It might be interesting to explore what an API for the UDA might look like that exposed sitemap. So an API like:
GET /datasets/sitemap
that could be exposed in a robots.txt with something like
User-agent: EarthCube_DataBot/1.0
Allow: /
Sitemap: https://example.org/datasets/sitemap
would be interesting. Given the other existing APIs, this seems like it would
be relatively easy since most of the real work is already being done by those.
That endpoint response could be a valid sitemap.xml
document with the /datasets/people/entities
entries.
Such an approach would allow clients to
1) use the classic web architecture approach as an entry level method.
2) then explore more capable approaches in the UDA API for those clients wishing to do so.
The common patterns section at https://open.mimiro.io/specifications/uda/latest.html#common-patterns seems to indicate it would be easy to either schema.org directly or at least modify the context to express mappings for the types we are looking for in ODIS. On my skimming of the docs, I don't fully follow any constraints the UDA places on vocabularies, types and such things.
OK, just my 2 cents to help move the conversation along.
https://open.mimiro.io/specifications/uda/latest.html
https://github.com/C4IROcean/ocean-open-data-sync-protocol/blob/master/specification.md
https://stacspec.org/en https://github.com/radiantearth/stac-browser
Bit late to the party but just wanted to jump in. I'm glad you played the devils advocate @fils and I completely agree that this is not a technical issue. But I'm wondering if there might be an opportunity to run a quick technical pilot with a few ODIS/OIH partners to see how this would work in practice. Ideally, I'd like to include some of the partners from the Iliad project as well or maybe in parallel.
I think the Iliad partners I have in mind might need a helping hand technically, but if we demonstrate how straightforward it is to implement, I think we might get a lot of traction.
Thoughts?
It might be interesting to explore what an API for the UDA might look like that exposed sitemap. So an API like:
GET /datasets/sitemap
that could be exposed in a robots.txt with something like
User-agent: EarthCube_DataBot/1.0 Allow: / Sitemap: https://example.org/datasets/sitemap
would be interesting. Given the other existing APIs, this seems like it would be relatively easy since most of the real work is already being done by those. That endpoint response could be a valid sitemap.xml document with the
/datasets/people/entities
entries.
+1 for this approach, a "broker" to go from STAC/OGC API, etc. into a /sitemap
endpoint (this could become quite popular)
Latest updates since our last meeting on 2023-08-18:
GIven our most recent call, we may be able to go the sitemap --> embedded metadata in the ODP pages now
@jmckenna
In the distribution stanza, we can add links to the raw data (we saw that it was an application/octet-stream
in the ODP preview, as well as any netCDF (application/x-netcdf
) conversions that HubOcean do
We should also add a variableMeasured
stanza, where we can list that - for example - raw acoustic profiles were measured (it doesn't matter if there's no Level 2 and upwards data there)
@pbuttigieg , all: see related JSON-LD comments in https://github.com/iodepo/odis-arch/issues/368
@pbuttigieg I added variableMeasured
and "encodingFormat": "application/octet-stream"
into master (see https://github.com/iodepo/odis-arch/blob/master/book/thematics/dataset/graphs/krillMetadata.json )
@smrgeoinfo sorry for the timing, some major repository restructuring happened in the last 24 hours here, will grab the new location.....
@smrgeoinfo here is the permanent location of the JSON-LD template: krillMetadata.json
@jmckenna ref my last email, here are the updates on our proof of concept sitemap and dataset examples:
Our ODIS catalogue registration https://catalogue.odis.org/view/3299 Our proof-of-concept sitemap https://oih-sitemap.azurewebsites.net/ Two sample datasets in the sitemap: https://oih-sitemap.azurewebsites.net/dataset?name=AkerBioMarineEK60EK80EchosounderAKBMdata https://oih-sitemap.azurewebsites.net/dataset?name=PGSbiotadata_mammal_turtle_observations_raw_files
So hopefully we are close to testing the integration.
I have some queries on the JSON-LD specification that we supplied, so we might want to make some edits based on your feedback. Most of it I am happy with but the key queries are: --Correct entry for "@id" : I didn't find a good example for the content we should have here, so just put our platform identifier. Should it be the JSON-LD url location? --"identifier" section. We do not have doi's for these particular datasets (something we need to look into) so only have our platform specific identifiers. This could the UUID (e.g. "a22c1e17-b00e-43f3-91f1-9aefedf58ec0") our "qualified name" (e.g. "1e3401d4-9630-40cd-a9cf-d875cb310449-akbm-raw-ds"), the catalog url that takes to our landing page (behind login) e.g. "https://app.hubocean.earth/catalog/dataset/1e3401d4-9630-40cd-a9cf-d875cb310449-akbm-raw-ds" --I think I remember you saying at some point it is not a good idea to have a direct link to the catalog urls, better to have the base url and then identifiers separate, but wasn't sure if this is necessary (since they are behind a login) or how I would implement this if preferable. --whether to render the JSONLD as XML or HTML (e.g. with wrapper) as I have seen examples of both.
@jmckenna et al. Thanks for the meeting today. Just to update: we implemented the 'must-fix' issues like id, propertyID etc. We will look into adding the distribution and conditionsOfAccess blocks in the near future, but there should be no barriers to test whenever you like before that also.
master
branch at https://github.com/iodepo/odis-arch/blob/master/book/thematics/dataset/graphs/krillMetadata.jsoncc @TaraOceanData @thomafred