wmo-im / wcmp2

WMO Core Metadata Profile 2
https://wmo-im.github.io/wcmp2
6 stars 3 forks source link

Provider roles study #109

Closed gaubert closed 1 year ago

gaubert commented 1 year ago

The following issue has been created to define the role codelist or guidance in the contact property. The purpose of this property is to help users understanding who is the data provider/host/distributor and whom to contact in case of questions/issues. It defines the relationship between the described resources, eg the data collection and entity (organisations) managing and giving access to the resource.

Review of standard

Many geospatial data in the past have been defining these roles. Below is a list and description of the reviewed standards:

ISO 19115

see https://schemas.isotc211.org/schemas/19115/resources/Codelist/cat/codelists.xml#CI_RoleCode

Role | Description -- | -- resourceProvider | party that supplies the resource custodian | party that accepts accountability and responsability for the data and ensures appropriate care and maintenance of the resource owner | party that owns the resource user | party who uses the resource distributor | party who distributes the resource originator | party who created the resource pointOfContact | party who can be contacted for acquiring knowledge about or acquisition of the resource principalInvestigator | key party responsible for gathering information and conducting research processor | party wha has processed the data in a manner such that the resource has been modified publisher | party who published the resource author | party who authored the resource EndFragment-->

The main roles are captured but it is also extremely detailed allowing to qualify many different relationships between the entry and the resource. Some of the relationships are not relevant when the resouce is a dataset and are inherited from the library/book management community. This very high number of roles is leading to the provision of too much information and creating some confusion when creating the metadata on how to qualify the entities in relation to the resource but also it creates some ambiguity on which entities to qualify in the metadata.

ISO 19115-3

See https://schemas.isotc211.org/schemas/19115/resources/Codelist/cat/codelists.xml#CI_RoleCode

Very similarly to ISO 19115 some roles have been updated and the list of roles has been simplified but there is still a large number of them that are not relevant for the WMO use case.

Stac

From STAC Item Provider roles, see https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#provider-object.

The provider's role(s) can be one or more of the following elements:

  • licensor: The organization that is licensing the dataset under the license specified in the Collection's license field.
  • producer: The producer of the data is the provider that initially captured and processed the source data, e.g. ESA for Sentinel-2 data.
  • processor: A processor is any provider who processed data to a derived product.
  • host: The host is the actual provider offering the data on their storage. There should be no more than one host, specified as last element of the list.

The stac role provider is very simple with four roles: licensor, producer, processor and host. Producer and host fullfils the needs of an organisation like EUMETSAT even though host is very abstract or organising the relationship between entity and resources around who has produced the data and who is hosting the data instead of data distribution. This is more inline with the "Big Data" and "Cloud Services" concepts.

DCAT2/3

DCAT doesn't define a controlled vocabulary for the entity that have a relationship with the qualified resource but refer to a list of code list from different standards.

  • [ISO-19115-1] DS_AssociationTypeCode : see above
  • IANA Registry of Link Relations [IANA-RELATIONS]: This is extremely abstract and not enough focused at it describes all the relationship toward links.
  • DataCite metadata schema [DataCite] . Centered around the publisher and creator
  • MARC relators : extremely wide as it defineds a relationship between an agent and a bibliographic reource

DCAT doesn't preclude and codelist and 3 of the one cited by the DCAT standard are either related to another community (library) or to focused on the unique entity managed to be used in WCMP2.

Experience regarding the EUMETSAT metadata curation team

EUMETSAT is managing a catalogue of 900 products with a metadata catalogue based on ISO-19115 as the standard. The product navigator (https://navigator.eumetsat.int/ ) is ingesting ISO-19115 records and they are transformed into a jsonised internal version fitting the EUMETSAT needs. The catalogue contains EUMETSAT product description records (collections) but also products originating from EUMETSAT partners (NOAA, CMA, ...) and distributed by EUMETSAT. This means that there are records with an owner that is also a distributor when the data is produced and distributed by EUMETSAT or an external party. For instance this fengyun dataset is produced by CMA and distributed by EUMETSAT: https://navigator.eumetsat.int/product/EO:EUM:DAT:FENGYUN:FY2G-AMV. Addtionally, there are records produced and distributed by the same entity. For instannce the EUMETSAT satellite datasets: https://navigator.eumetsat.int/product/EO:EUM:DAT:MSG:HRSEVIRI. The metadata/catalogue curation team also defines the user needs from the collection catalogue as follow: Users will want to either contact the data producer (questions about the data content, issues/improvements on the data content, ....) or the data distributor (questions regarding the distribution mode, issue/improvement on the access). This is a typical case of data handling/distribution for a meteorological centre.

Conclusion and Proposal

The ISO 19115 (and ISO 19115-3) codelist can fulfill the use case of the meteorological centres as it has been used in WCMP but it is a very extensive codelists that covers more than the meteo community use cases and therefore creates a lot of ambiguity with regards to qualifying the entities that are handling the WCMP2 resource (datasets). It is advised here to limit the different types of roles to create an homogenous and simple to understand catalogue for the users. DCAT doesn't define a codelist vocabulary and refers to some that could be used but except the ISO-19115 codelist the others are not relevant. The Stac Provider roles codelist is simple and providing a set of relevant role qualifier for the WMO use-case and it could be adapted easily. It is proposed to use it unless additional use-cases that cannot be covered by the STAC provider roles arise.

gaubert commented 1 year ago

@tomkralidis @amilan17 @jsieland @solson-nws @josusky @davidpodeur @antje-s @hananeKamil Here the analysis regarding the role codelist. Please review it and provide some comments. Many thanks

antje-s commented 1 year ago

@gaubert: thank you for the good summary and proposal A reduction to as few roles as possible for a clear differentiation is beneficial from my point of view and as written we could start with this and see if cases occur that cannot be mapped

amilan17 commented 1 year ago

I agree, the STAC roles look sufficient to me. "licensor, producer, processor or host"

tomkralidis commented 1 year ago

TT-WISMD 2022-06-22:

  • TT agrees to STAC roles for contacts
  • @tomkralidis will update/PR
tomkralidis commented 1 year ago

Implemented in #111