mjanez / ckanext-dcat

CKAN DCAT-AP/GeoDCAT-AP/NTI-RISP extension
https://github.com/mjanez/ckanext-schemingdcat
0 stars 1 forks source link

Feature/hvd category #16

Closed mjanez closed 5 months ago

mjanez commented 5 months ago

This pull request proposes an enhancement to the ckanext-dcat metadata schema by incorporating management of high-value datasets (HVD). As mandated by Directive (EU) 2019/1024 and its subsequent implementation regulation, European Union member states are actively working towards making high-value datasets accessible to citizens and businesses under standardized technical requirements to promote their reuse and positive impact on society, the economy, and the environment.

The addition of HVD management to the metadata schema is crucial for enabling administrations to effectively identify and disseminate these datasets, thereby addressing the challenges posed by the high heterogeneity in formats, structures, and semantics. Specifically, starting February 2025, member states will be required to report to the Commission every two years on the available high-value datasets, including links to license conditions and APIs.

To support this endeavor, the European Data Portal has published the "Report on Data Homogenisation for High-value Datasets," proposing a methodological approach to facilitate the identification and harmonization of HVD. This approach includes:

  1. Update profiles.py https://github.com/mjanez/ckanext-dcat/blob/c4c1dee5ced89624d7c6b32fd6f7b9b5205b4861/ckanext/dcat/profiles.py#L1-L3015
  2. Update testing.

This pull request aims to align the ckanext-dcat metadata schema with the recommendations outlined in the report, enabling seamless integration of HVD management functionalities into CKAN instances. By incorporating support for HVD, CKAN users will be empowered to efficiently manage and share datasets of high value, fostering greater data interoperability and promoting their reuse across various applications.

Resource Table:

Resource Description Relevant Data Categories
Directive Inspire Characteristics that spatial information and its metadata must have. Geospatial Data, Earth Observation and Environmental Data, Meteorological Data, Transportation Network Data
Data Specifications of the Inspire Directive Models, schemas, and coding rules for different thematic areas of spatial data. Same as above
Inspire Network Services (network services) Set of common interfaces for web services that enable the discovery, visualization, download, and transformation of spatial data. Same as above
Technical Guidelines for Inspire Metadata Technical guidelines for metadata, with minimum elements to include defined in Commission Regulation 1205/2008. Same as above
GeoDCAT-AP Extension of the DCAT application profile to describe geospatial datasets. Geospatial Data
Core Location Vocabulary Simplified data model that includes the fundamental characteristics of a location, represented as an address or geographic name, or through geometry. Geospatial Data
General Multilingual Environmental Thesaurus (GEMET) Specialized controlled vocabulary on environmental information. It has a section of concepts linked to spatial data categories included in Inspire. Geospatial Data, Earth Observation Data, Transportation Network Data
Semantic Sensor Network W3C recommendation for describing sensors and their observations. Meteorological Data
Quantity, unit, dimension and type (QUDT) Set of ontologies defining basic classes, properties, and constraints used to model physical quantities, measurement units, and their dimensions in various measurement systems. Meteorological Data
List of Eurostat Statistical Classifications Statistical classifications maintained by Eurostat, available as Linked Open Data in XKOS, the SKOS extension for modeling statistical classifications. Presented by classification family, categorized by statistical area and subdomains (e.g., NACE for economic activity, which we will describe later). Statistical Data
Eurostat Standard Code Lists Predefined and organized sets of elements that present statistical concepts through unique codes. Statistical Data
Statistical Data and Metadata eXchange (SDMX) Global initiative to standardize and harmonize the exchange of statistical data and metadata. It offers technical standards (the SDMX information model), guidelines, a computing architecture, tools, and a series of tutorials to help users. Statistical Data
RDF Data Cube Vocabulary Ontology for describing multidimensional data, such as statistics, based on the core of the SDMX 2.0 information model. Statistical Data
Core Business Vocabulary Mentioned by the regulation itself, it consists of a simplified data model that captures the fundamental characteristics of a legal entity, such as its legal name, activity, or address. Business Registers
NACE Code Codes for the classification of economic activities in the European Union. Its NACE 2 revision was published by the European Commission in October 2022 Business Registers
Organization Ontology W3C ontology to support the publication of linked data related to organizational information, i.e., it provides a series of ways to represent the relationship between people and organizations, along with the internal information structure of an organization. Business Registers
Global Legal Entity Identifier Foundation Centralized database with information about legal entities participating in global financial markets. It assigns each entity a unique Legal Entity Identifier (LEI) code recognized globally. Business Registers
NST Taxonomy Classification system for goods transported by road, rail, inland waterways, and sea. It takes into account the economic activity associated with the origin of the goods. Transportation Network Data
"Transport Service" Authority Table List of codes for different types of transport services provided by the EU Vocabularies section. Transportation Network Data

References:

This enhancement aligns with the ongoing efforts to facilitate the open data ecosystem and promote transparency, innovation, and socioeconomic development.