wmo-im / wcmp2

WMO Core Metadata Profile 2
https://wmo-im.github.io/wcmp2
6 stars 3 forks source link

review and provide recommendation on future of WCMP #11

Closed tomkralidis closed 1 year ago

tomkralidis commented 3 years ago

Summary and Purpose

WCMP Status

WIS

Current landscape

Proposal

@wmo-im/tt-wismd to assess and evaluate options for the future of WCMP against established criteria of requirements, e.g.:

Criteria

Reason

As TT-WISMD, we need to put forth next steps in realizing discovery in alignment with WIS 2.0 and current state.

cc @6a6d74 @joergklausen

chris-little commented 3 years ago

@tomkralidis Maybe you need to add to the excellent outline above something around measures of success (some of which could be applied to the earlier WCMPs). Or at least describe what success would loook like. HTH

efucile commented 3 years ago

@tomkralidis Maybe you need to add to the excellent outline above something around measures of success (some of which could be applied to the earlier WCMPs). Or at least describe what success would look like. HTH

I agree with @chris-little. I would say that we need to clarify objectives and benefits for the user.

amilan17 commented 3 years ago

I think it will be valuable to create an inventory of existing discovery portals including the standards and formats supported.

amilan17 commented 3 years ago

Most recent documentation on https://community.wmo.int/wis/wis2-implementation

tomkralidis commented 3 years ago

As part of this work, we must also keep in mind the W3C Spatial Data on the Web Best Practices. @wmo-im/tt-wismd please review as part of our WCMP 2.0 efforts. Thanks again.

chris-little commented 3 years ago

And do not forget the underlying W3C Data on the Web Best Practices .

tomkralidis commented 3 years ago

I originally included this, but then removed it given lineage. Nevertheless good to articulate explicitly, thanks @chris-little!

efucile commented 3 years ago

.. and don't forget that generic best practices need to be tailored for our needs. By the way do we know our needs?

tomkralidis commented 3 years ago

By the way do we know our needs?

IMHO our needs are rooted in lowering the barrier to our data. We need to satisfy discovery of WIS resources for both power users and mass market.

tomkralidis commented 3 years ago

2021-06-04 TT-WISMD meeting : @wmo-im/tt-wismd please review the W3C DCAT standard as a possible candidate for WCMP 2.0 for discussion at our next meeting.

tomkralidis commented 3 years ago

@wmo-im/tt-wismd please also review the OGC API - Records core metadata record model. This is based on DCAT and is based on GeoJSON which provides robust/broad interoperability.

josusky commented 3 years ago

Thanks for the interesting information about OGC API - Records, Tom. To familiarize me with it, I have tried the load the example document and validate it using the schema. Apparently, that was too big a task for the start :-) First of all, my JSON parser complains when loading the example ("record.json"), it does not like the extra comma on line 140. Secondly, I have tried to load the yaml ("recordGeoJSON.yaml"), and then use it as the schema argument for validation, like this: jsonschema.validate(instance=record, schema=schema) that failed terribly. Most probably the problem is that the schema refers to other documents. Would it be possible to provide an example (python code) on how to use the schema for actual validation?

tomkralidis commented 3 years ago

@josusky thanks for testing. The OGC API - Records schemas are currently in development. You are correct that the validation issues are rooted in the $ref objects, which themselves are YAML (not JSON).

Having said this (if you want to dig deeper), the following jsonschema PR, along with the below, works as expected:

import json
import os
import sys 

from jsonschema import RefResolver, validate
import yaml

def validate_json(instance: dict, schema: dict, schema_dir: str) -> bool:

    resolver = RefResolver(base_uri=f'file://{schema_dir}/', referrer=schema)

    validate(instance, schema, resolver=resolver)

    return True

if __name__ == '__main__':
    if len(sys.argv) < 3:
        print(f'Usage: {sys.argv[0]} <instance> <schema>')
        sys.exit(1)

    schema_dir = os.path.dirname(os.path.abspath(sys.argv[2]))

    with open(sys.argv[1]) as fh1, open(sys.argv[2]) as fh2:
        instance = json.load(fh1)
        schema = yaml.load(fh2, Loader=yaml.SafeLoader)

        try:
            validate_json(instance, schema, schema_dir)
        except Exception as err:
            msg = f'ERROR: {err}'
            print(msg)

and then assuming you cloned https://github.com/opengeospatial/ogcapi-records:

python foo.py core/examples/json/record.json core/openapi/schemas/recordGeoJSON.yaml
josusky commented 3 years ago

Thanks, @tomkralidis , I am not sure I wanted to dig as deep as to patch the jsonschema module. I am just a humble C++ developer who occasionally moonlights in Python :-) Perhaps, until the PR in jsonschema is accepted, we could have the JSON schema exported into, er, JSON, as the reference version that could be used directly?

PS: I did try your code, with the tools I have, and it ended with "ERROR: Expecting value: line 1 column 1 (char 0) " and I further tracked it into the loading of "geometryGeoJSON.yaml" where line 1 starts with " - $ref: ...".

tomkralidis commented 3 years ago

@josusky once we have a 1.0 schema, then we will have published a single/all in one YAML (see the OGC API - Features example: http://schemas.opengis.net/ogcapi/features/part1/1.0/openapi/ogcapi-features-1.yaml), which we can use without worrying about external references.

Having said this, validating would need an extra step to tie to a given shared component in the schema (vs. a single file of a building block) to ensure proper validation.

jsieland commented 2 years ago

As said in out last meeting, we tried to play around with the OGC API Records (OARec). Below I added first our first try and after that the original XML we wanted to translate. Thanks to my lovely colleague Antje Schremmer for her help and contribution.

Here are some points we found while working :

And other questions regarding the further WMO context (coming from discussions with other colleagues):


JSON: OGC ```` { "id": "urn:x-wmo:md:int.wmo.wis::SADL35EDZO", "type": "Feature", "geometry": { "type": "Polygon", "coordinates": [ [47.2747, 5.8650], [47.2747, 15.0338], [55.0565, 15.0338], [55.0565, 5.8650], [47.2747, 5.8650] ] }, "properties": { "recordCreated": "2017-05-26T06:44:07Z", "recordUpdated": "2021-08-23T12:30:00Z", "type": "dataset (http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/codelist/gmxCodelists.xml#MD_ScopeCode)", "title": "GTS Bulletin: Meteorological Aviation Routine Weather Report (METAR) Germany, surface observation airport", "description": "(Should be changed to human description)...The SADL35 TTAAii Data Designators decode as: T1 (S): Surface data T1T2 (SA): Aviation routine reports A1A2 (DL): Germany (The bulletin collects reports from stations: EDHA;EDHI;HAMBURG-FINKENWERDER ;EDHK;KIEL-HOLTENAU ;EDJA;MEMMINGEN ALLGAU ;EDMO;OBERPFAFFENHOFEN ;EDOP;SCHWERIN PARCHIM ;EDTD;DONAUESCHINGEN-VILLINGEN ;EDTL;LAHR ;EDTY;SCHWAEBISCH HALL ;EDVE;BRAUNSCHWEIG WOLFSBURG ;EDXW;WESTERLAND SYLT ;EDZO;)", "keywords": [ "ceiling", "cloud", "dewpoint", "pressure", "temperature", "visibility", "weather", "wind" ], "keywordsCodespace": "http://codes.wmo.int/306/4678", "language": "en", "externalID": "urn:x-wmo:md:int.wmo.wis::SADL35EDZO", "created": "2013-07-01T00:00:00Z", "updated": "2017-05-26T00:00:00Z", "publisher": [ { "individial-name": "Kai-Thorsten Wirt", "organizationName": "Deutscher Wetterdienst", "positionName": "RTH FOCAL POINT", "contactInfo": [ { "phone": "+49 (0) 69 8062-2546" } ], "address": [ { "delivery-point": "Frankfurter Straße 135", "city": "Offenbach", "postal-code": "63067", "country": "Germany" } ], "onlineResource": "http://www.dwd.de" } ], "themes": [ { "concepts": ["meteorology", "weatherObservations"], "scheme": "https://wis.wmo.int/2012/codelists/WMOCodeLists.xml#WMO_CategoryCode#MD_KeywordTypeCode_theme" }, { "concepts": ["EDHA", "EDHI", "EDHK", "EDJA", "EDMO", "EDOP", "EDTD", "EDTL", "EDTY", "EDVE", "EDXW", "EDZO"], "scheme": "https://wis.wmo.int/2012/codelists/WMOCodeLists.xml#WMO_CategoryCode#MD_KeywordTypeCode_place" }, { "concepts": ["GlobalExchange"], "scheme": "https://wis.wmo.int/2012/codelists/WMOCodeLists.xml#WMO_CategoryCode#MD_KeywordTypeCode_dataCentre" } ], "formats": [ "http://codes.wmo.int/wmdr/DataFormat/_FM-15-metar" ], "contactPoint": "gisc@dwd.de", "license": "http://codes.wmo.int/wmdr/DataPolicy/_WMOOther", "rights": "access", "extent": [ { "spatial": { "bbox": [5.8650, 47.2747, 15.0338, 55.0565], "crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84" }, "temporal": { "interval": [["2013-07-01Z", null]], "trs": "http://www.opengis.net/def/uom/ISO-8601/0/Gregorian" } } ], "association": [ { "protocol": "https", "directDownloadURL": "false", "title": "WMO Information System Metadata Catalogue GISC Offenbach, view Metadata and download data", "href": "https://gisc.dwd.de/wisportal/#SearchPlace:q?pid=urn:x-wmo:md:int.wmo.wis::SADL35EDZO" }, { "protocol": "amqps", "title": "WIS Message System GISC Offenbach", "broker": "https://oflkd013.dwd.de", "exchange": "wisof", "topic/routing_key": "v03/wis/de/offenbach/surface/aviation/metar/de" } ] } } ````
XML ```` urn:x-wmo:md:int.wmo.wis::SADL35EDZO eng utf8 dataset Series of WMO GTS Bulletins;; Kai-Thorsten Wirt Deutscher Wetterdienst RTH FOCAL POINT +49 (0) 69 8062-2546 Frankfurter Straße 135 Offenbach 63067 Germany gisc@dwd.de http://www.dwd.de pointOfContact 2017-05-26T06:44:07Z WMO Core Metadata Profile of ISO 19115 (WMO Core), 2003/Cor.1:2006 (ISO 19115), 2007 (ISO/TS 19139) 1.3 WGS 84 World Geodetic System http://www.wmo.int/pages/prog/wis/2012/metadata/version_1-3/ WMO Core Profile version 1.3 GTS Bulletin: SADL35 EDZO - Surface data (details are described in the abstract) 2013-07-01Z publication 2017-05-26Z revision urn:x-wmo:md:int.wmo.wis::SADL35EDZO http://wis.wmo.int Kai-Thorsten Wirt Deutscher Wetterdienst RTH FOCAL POINT +49 (0) 69 8062-2546 Frankfurter Straße 135 Offenbach 63067 Germany gisc@dwd.de http://www.dwd.de distributor documentDigital The dot notation recommended by WMO-CBS IPET-MDRD is used to build the code identifier The SADL35 TTAAii Data Designators decode as: T1 (S): Surface data T1T2 (SA): Aviation routine reports A1A2 (DL): Germany (The bulletin collects reports from stations: EDHA;EDHI;HAMBURG-FINKENWERDER ;EDHK;KIEL-HOLTENAU ;EDJA;MEMMINGEN ALLGAU ;EDMO;OBERPFAFFENHOFEN ;EDOP;SCHWERIN PARCHIM ;EDTD;DONAUESCHINGEN-VILLINGEN ;EDTL;LAHR ;EDTY;SCHWAEBISCH HALL ;EDVE;BRAUNSCHWEIG WOLFSBURG ;EDXW;WESTERLAND SYLT ;EDZO;) WMO GTS Bulletin - Intended for global Exchange Kai-Thorsten Wirt Deutscher Wetterdienst Focal Point T+49 69 8062 2546 Frankfurter Straße 135 OFFENBACH 63067 Germany gisc@dwd.de originator continual dataset The details of the update frequence are described in the temporalElement H+20 H+50 temporal Aviation routine reports METAR aerodrome airport ceiling cloud dewpoint meteorological pressure temperature visibility weather weatherForecast wind theme meteorology weatherObservations theme WMO_CategoryCode 2012-06-27 revision Codelists for description of metadata datasets compliant with the WMO Core Metadata Profile version 1.3 [http://wis.wmo.int/2013/codelists/WMOCodeLists.xml] 2012-06-27 revision WMO_CategoryCode WMO Secretariat publisher BRAUNSCHWEIG WOLFSBURG DONAUESCHINGEN-VILLINGEN EDHA EDHI EDHK EDJA EDMO EDOP EDTD EDTL EDTY EDVE EDXW EDZO HAMBURG-FINKENWERDER KIEL-HOLTENAU LAHR MEMMINGEN ALLGAU OBERPFAFFENHOFEN SCHWAEBISCH HALL SCHWERIN PARCHIM WESTERLAND SYLT place Meteorological geographical features GEMET - INSPIRE themes, version 1.0 2008-06-01 publication GlobalExchange dataCentre WMO_DistributionScopeCode 2012-06-27 revision WMOOther otherRestrictions otherRestrictions WMOOther GTSPriority2 eng utf8 climatologyMeteorologyAtmosphere The product/data covers the following region/bounding box: Germany 5.865 15.03382 47.27472 55.05653 2013-07-01Z SADL35 : GTS Bulletin: SADL35 EDZO - Surface data (details are described in the abstract) thematicClassification FM 15 http://www.wmo.int/pages/prog/www/WMOCodes.html https://gisc.dwd.de/wisportal/#SearchPlace:q?pid=urn:x-wmo:md:int.wmo.wis::SADL35EDZO http GISC Offenbach, Deutscher Wetterdienst WMO Information System, download products/data through GISC Offenbach, Deutscher Wetterdienst dataset INSPIRE Data Specification on Meteorological geographical features 2010-12-08 publication See the referenced specification true High data quality controlled according to the procedures of the WIS This metadata record was created automatically as a representation of the bulletin declaration found in WMO # 9 Volume C1. Other references were used in the process, including WMO References such as WMO # 9 Volume A, WMO # 386 Manual on the GTS and WMO # 306 Manual on Codes. Other elements of information were also collected or created for the purpose of the GTS Metadata Generation. This work, as well as the creation of the representation of the WMO references was done by Deutscher Wetterdienst (DWD) on a "best effort" basis. ````
tomkralidis commented 2 years ago

Excellent job @jsieland and Antje! Thanks for the valuable feedback.

which schema version is the document based on? -> version management!

This was raised in the OGC API - Records working group in https://github.com/opengeospatial/ogcapi-records/issues/138, and a resulting schema update proposal in https://github.com/opengeospatial/ogcapi-records/pull/144, which basically means adding the following to the root of the JSON:

    "conformsTo": [
        "http://www.opengis.net/spec/ogcapi-records-1/1.0/req/record-core",
        "http://www.wmo.int/spec/wmo-core-metadata-profile-1/1.0/req/discovery-metadata-record"
    ]

how about multilingualism? in one document or better several?

Let's start with using alternate representations. For example, from your English JSON:

{
  "id": 123,
  "geometry": ...
  "links" [
        {
            "rel": "alternate",
            "type": "application/json",
            "title": "This document in German",
            "href": "https://example.org/foo.de.json",
            "hreflang": "de"
        },
 }

is there a possibility to indicate "continuous" for updating?

Let's try adding the following to properties:

"wmo:maintenanceFrequency": "continual"

publisher vs. originator -> originator part missing (?)

"wmo:originator": "foo"

is it possible to indicate a priority like now with GTSPriority?

"wmo:priority": "foo"

And other questions regarding the further WMO context (coming from discussions with other colleagues):

impact for transmitting/harvesting other metadata? -> ISO? OAI? INSPIRE? etc?

In the context of WIS 2.0, the harvesting workflow would support OGC API - Records, and our resulting "profile/extension" of the metadata model. Is this what you mean?

what is the minimum necessary set of mandatory elements for good metadata?

The goal is that WCMP 2.0 will have a JSON schema (which is based off the OGC API - Records record JSON schema), and will enforce cardinality accordingly.

will there be a translator?

We will need migration tools to go from WCMP 1.0 -> 2.0, is this what you mean?

Any overall feedback on experiences with working with the metadata in a JSON format (compared to XML)? Is this easier from a user or programmer experience? How hard was it to make the above translation? Any feedback on this front is valuable.

Thanks again Julia and Antje!

jsieland commented 2 years ago

which schema version is the document based on? -> version management!

This was raised in the OGC API - Records working group in opengeospatial/ogcapi-records#138, and a resulting schema update proposal in opengeospatial/ogcapi-records#144, which basically means adding the following to the root of the JSON:

    "conformsTo": [
        "http://www.opengis.net/spec/ogcapi-records-1/1.0/req/record-core",
        "http://www.wmo.int/spec/wmo-core-metadata-profile-1/1.0/req/discovery-metadata-record"
    ]

Looks good, especially the possibility to add more than just one.

how about multilingualism? in one document or better several?

Let's start with using alternate representations. For example, from your English JSON:

{
  "id": 123,
  "geometry": ...
  "links" [
        {
            "rel": "alternate",
            "type": "application/json",
            "title": "This document in German",
            "href": "https://example.org/foo.de.json",
            "hreflang": "de"
        },
 }

I like this approach because it allows to add/remove local versions.

is there a possibility to indicate "continuous" for updating?

Let's try adding the following to properties:

"wmo:maintenanceFrequency": "continual"

publisher vs. originator -> originator part missing (?)

"wmo:originator": "foo"

is it possible to indicate a priority like now with GTSPriority?

"wmo:priority": "foo"

Looks good, I will try to add this to the example! Not sure if I'm able to do this until our next meeting....

And other questions regarding the further WMO context (coming from discussions with other colleagues): impact for transmitting/harvesting other metadata? -> ISO? OAI? INSPIRE? etc?

In the context of WIS 2.0, the harvesting workflow would support OGC API - Records, and our resulting "profile/extension" of the metadata model. Is this what you mean?

what is the minimum necessary set of mandatory elements for good metadata?

The goal is that WCMP 2.0 will have a JSON schema (which is based off the OGC API - Records record JSON schema), and will enforce cardinality accordingly.

will there be a translator?

We will need migration tools to go from WCMP 1.0 -> 2.0, is this what you mean?

Both questions go in the same direction somehow... I hope I can explain this so it makes more sense: We use OAI-PMH for making our own metadata available to other parts of the Federal Government (like the Spatial Data Infrastructure Germany (SDI Germany) which has to be in ISO-XML and/or INSPIRE. So it would be nice to either have a tranlator which can convert XML to JSON and vice versa. Otherwise we might have to find our own solution for that. Not sure if this is a "German Thing" or if others have the same regulations. If not it would make things easier if elements would have similar names. And a translator for XML to JSON would be helpful in general (at least for the beginning)

Any overall feedback on experiences with working with the metadata in a JSON format (compared to XML)? Is this easier from a user or programmer experience? How hard was it to make the above translation? Any feedback on this front is valuable.

Just a disclaimer: We did that all by hand. So no programming involved, just a simple editor ;) But overall it was quite easy. It took me longer to find the necessary information in the original XML to copy it into the JSON. After my first mockup Antje was quick to add some further information. The examples in the OGC Github were very helpful although it seemed they are not complete yet (?). A shortage was that I could not use the YAML schema files on the fly - for that I would have needed to write a program. It would have been nice if that would have been possible. Also I just used the recordGeoJSON.yaml - I wasn't sure if a collection might be more fitting. And just as a side note: The JSON file was definitely smaller than the XML file - like 5 KB vs. 34 KB. And as a human the JSON is way better to read.

gaubert commented 2 years ago

Hi all,

Thanks to @jsieland for the great analysis (with a lot of very relevant points) and kick starting things.

We also have started to have a deeper looker in EUMETSAT at the OGC API Records metadata standard and below are our comments. We definitely think that it is a already great improvements but would like to fix additional things while we are working on a new standard. In particular, we would like to avoid if possible the work of translating our internal OGC API Records to have a WMO OGC API Records. With ISO we had to downgrade our internal version to export a WMCP records by striping and re-formating some information. This increases the maintenance work and create different version of information in different places.

First we would like to explain why we are making metadata records. It looks trivial but important to remind us about it. In EUMETSAT regarding the metadata business we're trying to focus on one aspect: providing enough information regarding a dataset to allow the user accessing the data by selecting the best access method or accessing information to best use the dataset. It can be some technical information regarding the dataset or associated dataset that will help the user do his job.

To best guide the user, we have a web catalogue (https://navigator.eumetsat.int) which provides discovery services on the EUMETSAT datasets entirely based on the information defined in the ISO metadata. A lot of additional information has been added to allow us building the best possible discovery experience. We had in some case to massage a bit the ISO standard and it is not allowing us to do anything we would like to. A metadata record should self descriptive and self contained for allowing EUMETSAT but also any discovery provider presenting in the best possible way our datasets.

So, to sum up, it is all about our users and the creation of a searchable / browsable catalogue and API information to give access to the datasets.

We have the following necessary categories of information in the metadata to describe EUMETSAT products:

- Descriptive/editorial information:
    - Descriptive information (title, abstract) 
    - A representative image of the data set (convention needed to represent it)
    - Responsible parties: originator (data producer, ultimately responsible for the data quality), publisher (data provider, carrier)
    - Contact information
- External associated resources (technical information, external datasets)
    - Technical information guides (referenced)
    - Associated collections, sibling datasets, ....
- Classification
    - Themes (used to create the facets/filters) => Browse/Filter
    - Keywords (Filters/Tags)  
- Licensing: should be resumed to a link containing the data policy information. Conditions required and user's responsibilities when access is given to the datasets.
- Access (how to get access to the data):
    - Services information (service end point): real time access services (EUMETCast distribution service relying on a private guaranteed network), internet near real time access service: API based web service, visualization services where the products are displayed, ....
    - Formats proposed by the service
    - Links to examples of API requests (possibly active) and examples of expected files, static samples
    - Additional information regarding number of files produced per day)
    - What about associated services like more and more having access to quality monitoring services or calibration services ?
- Citation
    - How to reference the datasets in scientific or other publication. This is valid for Climate records and a lot of other datasets
        - DOI (Digital Object Identifier),
        - Citation information.

Below is a first set of comments/questions and improvements we would like to see/discuss with the team and are applicable to OARec.

-  Design without the GTS and plan to retire it:

The WIS 2.0 should be design in isolation from the GTS. The GTS now represents the tip of the iceberg in the ocean of available data (model data, satellite mission data, climate and reanalysis data) which is now daily accessed. The GTS is still very useful and servicable but for the benefit of WIS 2.0 it should be considered as an external service to the WIS 2.0 Catalogue, ie the metadata should not contain any specific GTS information (keywords like global exchange) outside of the "GTS access part" in the association property or in a given extension. Still with that architecture, the GTS data could be very efficiently retrieved and accessed in a dedicated data access service advertised in the metadata.

Opinions, thoughts ?

-  Collection/Records granularity:

In the WIS catalogues, some datasets are describing almost individual records (in situ observations) and others 30 years of data. Some of the individual records datasets are transforming the search experience in a bad experience because when doing a given search, the results are polluted by very similar records repeated n times. Should we would propose to define one granularity (the collection level) for avoiding recreating the same problems in the future WIS 2.0 catalogues ? Looking at the OGC API Records most of the examples are describing individual records and not collections. Most of the WIS catalogue contents are collections. You can also describe collections with the OGC API Records. The collection part is discussed in the standard but it is seems more of a catalogue part regrouping records ? What is the status about that part in the OGC instances and what should be the intention for the WIS metadata team ?

Opinions, thoughts ?

- Extendability and customisation.

There are a lot of specific information related to satellite datasets that are only interested for our community and we would like to take advantage of JSON and its extendability principles. We might want to have a section in the OGC API records where it is possible to describe the instrument infromation for instance or products specifics when necessary. How is the extendability forseen ? Can we imagine to have a "satellite" part that is provided without having to reformat to strip it from the produced metadata records to avoid extra work for managing the interface towards WMO ? We believe this should also apply to other communities. In that case the record should be extended. It would be good to define where and how it can be extended (can extra properties being added anywhere, should they re-grouped in top property, etc ? )

Opinions, thoughts ?

Then going through each of the defined categories above, here are some comments relative to our OGC API records analysis:

- Descriptive/editorial information 
    - Descriptive information (title, abstract)

The metadata is used to build discovery services (indexed information) but also to display the information in the best possible ways to our users. Big limitations with the current standards have been to have the ability to structure the textual information mainly in the abstract to best present it to users for instance using paragraphs and making use of editorial technics (bold, underlined, headers) as well as creating links. Could we imagine to have an optional edited abstract containing markdown information ? It would be in addition to the existing abstract and would greatly help in the representation of the information.

Opinions, thoughts ?

    - Representative image of the data set (convention needed to represent it)

This is also extremelly important to have some images to present in a graphical way the dataset to the user. This can be optional. Information regarding the image resolution (width/height) would be preferable to best display. Portait or Landscape should be recommended in a guide to make best use of the available space.

Opinions, thoughts ?

    - Responsible parties: originator (data producer, ultimately responsible for the data quality), publisher (data provider, carrier)

Like @jsieland and DWD, EUMETSAT is providing access to a lot of 3rd party datasets so a publisher (the one providing access to the data) and an originator (the one creating the data and responsible for the data quality) are needed.

    - Contact information

An additional contactPoint referencing a link providing all the information to contact the first line of support related to the dataset is really welcomed. This should be what is presented to the user to ask questions about the dataset. It already exist in API Records.

- External associated resources (technical information, external datasets)
    - technical information guides (referenced)
    - associated collections, sibling datasets, linked data. This becomes really important with Climate records.

This can be provided using the links part of API records using the "rel" property to define the type of associated resource. We like it as it is really open but we recommend the definition of a set of existing relationships. We've seen some but could not find a definitive source for it. Where can it be found ?

Opinions, thoughts ?

   - classification
       - Themes (used to create the facets/filters) => Browse/Filter
       - Keywords (Filters/Tags)  

The current search experience in our catalog is based on keyword search and facets in a classical user interface with facets, keywords on the left and search results on the right, eg: https://navigator.eumetsat.int/search?query=SAF . However recent user consultations/surveys has demonstrated the need to have a hierachical brows-able presentation of the information with intermediate level if possible to explain to beginner, new comers the complexity of our datasets and our field. Starting from very high level thematics like Ocean, Weather, Climate and guiding the user to the datasets and or services (for instance here is an intermediate level about Ocean :https://www.eumetsat.int/what-we-monitor/ocean) . Can the themes used to create the thematic hierarchy in the catalogue ? We miss that information to know how they are related.

Opinions, thoughts ?

   -licensing: should be resumed to a link containing the data policy information. Conditions required and responsibilities when access is given to the datasets.

The licensing information provision is a very complex topic with a lot of diversity and differences between the different license schemes. The responsibility of explaining and insuring that the user has been complying to the necessary conditions for having access to the data should be left to the data access service. We recommend simply providing a link to the data access license information (eg, https://www.eumetsat.int/eumetsat-data-licensing) . It seems that this is what has been done so far in the examples with the license field and rights field. The task force should recommend what is expected in the second one as most of the providers will want to retain their copyrights and depending on the datasets prevent re-distribution or not.

Opinions, thoughts ?

   - Access (how to get access to the data):
       - services information (service end point): real time access services (EUMETCast distribution service relying on a private guaranteed network), internet near real time access service: API based web service, visualization services where the products are displayed, ....
       - formats proposed by the service 
       - links to examples of API requests (possibly active) and examples of expected files, static samples
       - additional information regarding number of files produced per day)

This part is really essential as it gives to the user a lot of information regarding which service provides access to the found dataset and how. In EUMETSAT our catalogue defines using the ISO metadata record the list of format provided by each service. This is a really important part for the service. It seems that OGC API records doesn't provide that kind of linkage. Is it true and if yes is it possible to extend the access to link the format with the services. See here https://navigator.eumetsat.int/product/EO:EUM:DAT:MSG:HRSEVIRI for instance how our product navigator represents in the Access part the services and additional information related to the formats available for each services. It is also really key to provide some information related to service itself (link to build a service preview, number of files produced, links to example datasets not requiring registration). Is it possible to define a minimum set of mandatory properties and optional ones plus the ones that you can freely add without becoming uncompliant when validated ?

Opinions, thoughts ?

   - citation 
       - How to reference the datasets in scientific or other publication. This is valid for Climate records and a lot of other datasets
           - DOI
           - Citation information.

More and more datasets like the re-analysises, climate records even now real-time ongoing datasets have citation information attached to them to allow any publications referencing them (mainly scientific). We provide that kind of information using an extension of the ISO standard and think that it is really important for the new WMO metadata standard to embed it. Here is an example of a dataset with the citation information ( https://navigator.eumetsat.int/product/EO:EUM:DAT:0080). Look at the citation part like DOI, authors, publisher, references. Is it possible to add that information in the OGC API records ?

Opinions, thoughts ?

Below is an example of a EUMETSAT dataset record with some additional more technical questions (they are also inserted with an non-conformat json way using a # comment):

For instance urn:x-wmo:md:int.eumetsat::EO:EUM:DAT:MSG:HRSEVIRI can be replaced by urn:x-wmo:md:int.eumetsat:EO:EUM:DAT:MSG:HRSEVIRI

JSON: OGC ```` { "id": "urn:x-wmo:md:int.eumetsat::EO:EUM:DAT:MSG:HRSEVIRI", #do we still need the double :: for the ids ? "type": "Feature", "geometry": { "type": "Polygon", "coordinates": [ [-81, -79], [-81, 79], [81, 79], [81, -79], [-81, -79] ] }, "properties": { "recordCreated": "2011-03-17T10:00:00Z", "recordUpdated": "2021-08-19T10:00:00Z", "type": "dataset (http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/codelist/gmxCodelists.xml#MD_ScopeCode)", "title": "High Rate SEVIRI Level 1.5 Image Data - MSG - 0 degree", "description": "Rectified (level 1.5) Meteosat SEVIRI image data. The data is transmitted as High Rate transmissions in 12 spectral channels. Level 1.5 image data corresponds to the geolocated and radiometrically pre-processed image data, ready for further processing, e.g. the extraction of meteorological products. Any spacecraft specific effects have been removed, and in particular, linearisation and equalisation of the image radiometry has been performed for all SEVIRI channels. The on-board blackbody data has been processed. Both radiometric and geometric quality control information is included. Images are made available with different timeliness according to their latency: quarter-hourly images if latency is more than 3 hours and hourly images if latency is less than 3 hours (for a total of 87 images per day). To enhance the perception for areas which are on the night side of the Earth a different mapping with increased contrast is applied for IR3.9 product. The greyscale mapping is based on the EBBT which allows to map the ranges 200 K to 300 K for the night and 250 K to 330 K for the day.", "keywords": [ "atmosphere", "land", "ocean" #keywords could be much more evolved and informational but they were the only available in ISO ], "keywordsCodespace": "http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#MD_KeywordTypeCode", "language": "en", "externalID": "EO:EUM:DAT:MSG:HRSEVIRI", "created": "2009-03-23T00:00:00Z", "updated": "now", #continuous update for the on-going collection (can we put now) or is it all about the metadata information (if this is the case then what is recordCreated) "publisher": [ # the yaml record schema refers to responsibleParty.yaml (where is the definition file ] The notion of distributor and originator (or responsible party is used) in our catalogue (we distribute lots of datasets from our partners). This is not described here. How can we model something similar { "individual-name": "European Organisation for the Exploitation of Meteorological Satellites", #if this is the definition I would not encourage the use of an individual name here. We are organisations and adding a name is confusing. We use EUMETSAT here. "organizationName": "EUMETSAT", "positionName": "RTH FOCAL POINT", #no need for us "contactInfo": [ { "phone": "+49(0)6151-807 3660/3770", #more than one contact resources here "email": "ops@eumetsat.int", "url" : "https://www.eumetsat.int/contact-us" } ], "address": [ { "delivery-point": "EUMETSAT Allee 1", "city": "Darmstadt", "postal-code": "64295", "country": "Germany" } ], "organizationLogo": "https://www.eumetsat.int/eum_logo", #optional provision of a logo "onlineResource": "https://www.eumetsat.int", ], "themes": [ # if there a possibility to create a thematic hierarchical view with the Themes. Is it just a bunch of keywords with the controlled vocab reference { "concepts": ["meteorology", "weatherObservations"], "scheme": "https://wis.wmo.int/2012/codelists/WMOCodeLists.xml#WMO_CategoryCode#MD_KeywordTypeCode_theme" }, { "concepts": ["EDHA", "EDHI", "EDHK", "EDJA", "EDMO", "EDOP", "EDTD", "EDTL", "EDTY", "EDVE", "EDXW", "EDZO"], "scheme": "https://wis.wmo.int/2012/codelists/WMOCodeLists.xml#WMO_CategoryCode#MD_KeywordTypeCode_place" }, { "concepts": ["GlobalExchange"], # all references to GTS are going to disappear "scheme": "https://wis.wmo.int/2012/codelists/WMOCodeLists.xml#WMO_CategoryCode#MD_KeywordTypeCode_dataCentre" } ], "formats": [ #for a collection we have multiple formats and want to relate formats with access points (associations). Ideally you want to show in your catalogue what formats are available for each access point. "application/x-geotiff", #should we recommend to use the MIME TYPE when it exists. What about non existing mime types "application/x-jpeg", "application/x-png", "application/zip", "application/x-eum-msg-native", "application/x-eum-hrit", "application/netcdf", "application/x-hdf", ], "contactPoint": "https://www.eumetsat.int/contact-us", "license": "https://www.eumetsat.int/eumetsat-data-licensing", #can you have more than one license advertised here ? "rights": "access", #copyright "extent": [ { "spatial": { "bbox": [ -79, -81, 79, 81 ], "crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84" }, "temporal": { # what is the difference between the temportal extent and created, updated information. "interval": [["2009-03-23Z", null]], # null means now ? Is it possible to have an explicit now ? "trs": "http://www.opengis.net/def/uom/ISO-8601/0/Gregorian" } } ], "association": [ # access points are defined as associations but how do you relate the formats that can be accessed through that service { "protocol": "https", "directDownloadURL": "true", "title": "EUMETView, the EUMETSAT near real time visualization service", "href": "https://eumetview.eumetsat.int/geoserv/wms?styles=raster&format=image%2Fgeotiff&height=3712&bbox=-77%2C-77%2C77%2C77&transparent=true&layers=meteosat%3Amsg_ir108&crs=EPSG%3A4326&service=WMS&request=GetMap&width=3712&version=1.3.0&exceptions=inimage" }, { "protocol": "amqps", "title": "WIS Message System GISC Offenbach", "broker": "https://oflkd013.dwd.de", "exchange": "wisof", "topic/routing_key": "v03/wis/de/offenbach/surface/aviation/metar/de" } ], "links" : [ # you would like to express different things here: related information such as associated technical information, associated datasets (linked datasets), #software { "rel": "resources", "type": "text/html", "title": "MSG Level 1.5 Image Data Format Description", "href": "https://www.eumetsat.int/media/45126" }, { "rel": "resources", "type": "application/vnd.ms-powerpoint", "title": "Meteorological Use Of The Seviri Ir3.9 Channel", "href": "http://eumetrain.org/IntGuide/PowerPoints/Channels/Channel_IR39.ppt" }, { "rel": "dataset", "type": "collection", "title": "High Rate SEVIRI Level 1.5 Image Data - MSG - Indian Ocean 41.5 degrees E", "href": "https://navigator.eumetsat.int/product/EO:EUM:DAT:MSG:HRSEVIRI-IODC" }, ] } } ````
tomkralidis commented 2 years ago

Thanks for the extensive comments @gaubert. Notes from our discussion today (feel free to update as desired)

{
    "formatted": {
        "abstract": "`foo`, **bar**",
        "markup_language": "markdown"
    }
}

Access links should be extended to be able to express supported formats:

        {
            "rel": "self",
            "title": "This document as JSON",
            "href": "https://example.org/api",
            "wmo:formats": [
                "application/json",
                "application/xml",
                "text/plain"
            ]
        }
"externalId": [
    {
        "scheme": "wmo-wis",
        "value": "urn:x-wmo:md:int.wmo.wis::https://geo.woudc.org/def/data/ozone/total-column-ozone/totalozone"
    },
    {
        "scheme": "doi",
        "value": "doi:10.14287/10000004"
    }
]
jsieland commented 2 years ago

Thanks @gaubert, this is a very comprehensive and valuable analysis. And it reminded me of some additional points I forgot:

  • publisher: the yaml schema refers to another imported responsibleParty.yaml. Where is it defined ?

I found it here: https://github.com/opengeospatial/ogcapi-records/tree/master/core/openapi/schemas But I'm a bit confused about why or when is something added in the schemas/common or just in /schemas? Same with the geometryGeoJSON.yaml and related files. Is this just "historically grown" or is there a reason I might have overlooked?

  • temporal extent: we guess that null expressed the on-going datasets (that are producing some records continuously). Can it be replaced by something less ambiguous like on-going, now) ?

I agree. Interesting in this context might be an issue which was raised recently by DCAT-AT.de: https://github.com/w3c/dxwg/issues/1403

tomkralidis commented 2 years ago

Thanks @gaubert, this is a very comprehensive and valuable analysis. And it reminded me of some additional points I forgot:

  • publisher: the yaml schema refers to another imported responsibleParty.yaml. Where is it defined ?

I found it here: https://github.com/opengeospatial/ogcapi-records/tree/master/core/openapi/schemas But I'm a bit confused about why or when is something added in the schemas/common or just in /schemas? Same with the geometryGeoJSON.yaml and related files. Is this just "historically grown" or is there a reason I might have overlooked?

Anything in schemas/common is supposed to be moved to/replaced by the OGC API - Common (which is the building block specification on top of which OGC API standards are to be developed against. As "common" reusable constructs are found while developing the standards, which can be reused by other OGC API standards, these are then proposed/move into OGC API - Common.

The benefit here is:

  • temporal extent: we guess that null expressed the on-going datasets (that are producing some records continuously). Can it be replaced by something less ambiguous like on-going, now) ?

I agree. Interesting in this context might be an issue which was raised recently by DCAT-AT.de: w3c/dxwg#1403

From the schema definition:

The value `null` is supported and indicates an open time interval..

Note that this is also derived from schema.org. We should consider any interoperability issues around having something specific for such a key primitive in our domain, as well as issues around "now" representing "not quite now" data (from a day ago, say).

We should definitely include temporal resolution (issued here with the OARec SWG).

For the moving window of data use case, should we consider this for only the data access perspective or beyond?

An example use case is an organization that has been producing hourly observations since 2009-07-11, with a rolling window of 90 days. From the discovery metadata perspective, I would still see the temporal extent as 2009-07-11/... It's the data access mechanism that provides the last 90 days (data beyond 90 days could be archived or made available through some other arrangement).

In this view, it would be valuable to to express temporal resolution in extent.temporal, and add a retention property of sorts in a link object, like:

        {
            "rel": "download",
            "type": "application/json",
            "title": "the last 90 days of data",
            "href": "https://example.org/api",
            "retention": "P90D"
        }

Thoughts?

amilan17 commented 2 years ago

We also need to consider coordination with the WIGOS metadata model. In particular, we need to coordinate on the following types of information.

  1. WMO Data Policy Code List: Core, Recommended, Other (currently known as WMOEssential, WMOAdditional, WMOOther)
  2. License
  3. Attribution (to support things like DOI citation)
  4. Owner
  5. Publisher
amilan17 commented 2 years ago

I'd like to have a summary of the standards we are evaluating.

  1. W3C DCAT
  2. OGC API - Records (based on DCAT)
  3. ISO 19115-1 Are we looking at other standards too?
jsieland commented 2 years ago

This is the example I mentioned in our last meeting: (Soil moisture monitoring with a rolling 15-day time window, with temporal coverage of the last 10 past days and the next 5 upcoming days.)

<gmd:temporalElement>
    <gmd:EX_TemporalExtent>
        <gmd:extent>
            <gml:TimePeriod gml:id="timeperiod" >
                <gml:beginPosition indeterminatePosition="now" >-P10D</gml:beginPosition>
                <gml:endPosition indeterminatePosition="now" >+P5D</gml:endPosition>
            </gml:TimePeriod>
        </gmd:extent>
    </gmd:EX_TemporalExtent>
</gmd:temporalElement>

This example can not be realized with DCAT (and so OARec). The W3C issue mentioned in wmo-im/wcmp2#11 has another example for rolling time windows.

efucile commented 2 years ago

@gaubert thanks for your comprehensive answer. I am going to comment on this only

  • Design without the GTS and plan to retire it: The WIS 2.0 should be design in isolation from the GTS. The GTS now represents the tip of the iceberg in the ocean > of available data (model data, satellite mission data, climate and reanalysis data) which is now daily accessed. The GTS is still very useful and servicable but for the benefit of WIS 2.0 it should be considered as an external service to > the WIS 2.0 Catalogue, ie the metadata should not contain any specific GTS information (keywords like global exchange) outside of the "GTS access part" in the association property or in a given extension. Still with that architecture, the GTS data could be very efficiently retrieved and accessed in a dedicated data access service advertised in the metadata.

I absolutely agree that we need to design without GTS and plan to retire it. However, we need to realise that GTS will be retired very slowly over many years or decades. The transition plan is not ready yet. However, I think that we are going to expose all GTS through WIS2 pub/sub protocols and therefore I can imagine that the new metadata will be simply linked to a WIS2 style source. It will be new-style metadata with a new style pub/sub protocols. I think that very little of the current catalogue will remain as is.

tomkralidis commented 2 years ago

Thanks @efucile. Here's the current thinking around GTS links via pub/sub: https://github.com/MetPX/wmo_mesh/issues/16#issuecomment-962744747, which would make its way into WCMP2 links.

gaubert commented 2 years ago

@efucile @tomkralidis Thanks for the answer. Ok so if I understand correctly, the GTS will be seen as one of the services providing access to some data (the GTS Observations) and the specific GTS information will only be in the access part of the metadata (like another service). That's good and simple to integrate.

amilan17 commented 1 year ago

@tomkralidis I think we can close this

tomkralidis commented 1 year ago

Related https://github.com/wmo-im/wcmp2/issues/10#issuecomment-1556991851, OGC API - Records has been used as the baseline for WCMP2, and the Global Discovery Catalogue (GDC).