metadata101 / dcat-ap

GNU General Public License v2.0
1 stars 1 forks source link

DCAT-AP Schema Plugin for GeoNetwork

This repository contains a DCAT-AP 2.0.0 schema plugin for GeoNetwork.

Reference documents

Description

This plugin has the following features:

Installing the plugin

Adding the plugin to the source code

To include this schema plugin in a build of GeoNetwork it needs to be added to the schemas folder in GeoNetwork. The best approach is to add the plugin as a submodule:

cd schemas
git submodule add <dcat-ap remote URL> dcat-ap
git submodule init
git submodule update

Add the new module to the schemas/pom.xml to make sure it is picked up by Maven:

<modules>
  <!-- ... -->
  <module>dcat-ap</module>
</modules>

Add the dependency in the web module in web/pom.xml:

<dependency>
  <groupId>org.geonetwork-opensource.schemas</groupId>
  <artifactId>gn-schema-dcat-ap</artifactId>
  <version>${project.version}</version>
</dependency>

Note that versions need to be updated to correspond to GeoNetwork version updates. This is applicable to dcat-ap/pom.xml and can be automated by running mvn versions:update-child-modules after merges of GeoNetwork:

<parent>
  <artifactId>gn-schemas</artifactId>
  <groupId>org.geonetwork-opensource.schemas</groupId>
  <version>x.y.z</version>
</parent>

Add the module to process-resources phase of web/pom.xml to make sure it is included in the build process:

<execution>
  <id>unpack-schemas</id>
  <phase>process-resources</phase>
  <goals><goal>unpack</goal></goals>
  <configuration>
    <encoding>UTF-8</encoding>
    <artifactItems>
      <!-- ... -->
      <artifactItem>
        <groupId>org.geonetwork-opensource.schemas</groupId>
        <artifactId>gn-schema-dcat-ap</artifactId>
        <type>zip</type>
        <overWrite>false</overWrite>
        <outputDirectory>${schema-plugins.dir}</outputDirectory>
      </artifactItem>
    </artifactItems>
  </configuration>
</execution>

Commit these changes.

Apply the patches to the GeoNetwork core. You may need to manually apply specific hunks of a patch.

# execute the following in the core-geonetwork root directory 
git am --ignore-space-change --ignore-whitespace --reject --whitespace=fix schemas/dcat-ap/core-geonetwork-patches/*.patch

Build and run the application following the Software Development Documentation. You'll need to have Java JDK 11 and Maven installed.

Samples and templates can be imported via the 'Admin Console' > 'Metadata and Templates' > 'dcat-ap' menu.

Make sure to import the thesauri located in schemas/dcat-ap/resources/thesauri as they are required for editing dcat-ap records.

Metadata rules: metadata identifier

The plugin uses dct:identifier to store a uuid that is used as (internal) metadata identifier. The metadata identifier is stored in the element dcat:CatalogRecord/dct:identifier. When saving a record, this uuid is appended to the dataset URI, provided that the metadata (template) contains a dataset URI that ends with a uuid and the record is not harvested.

<dcat:CatalogRecord rdf:about="https://metadata.vlaanderen.be/srv/api/records/818c2174-4f26-48b6-8f76-b51bb9cbc4e8">
  <dct:identifier>818c2174-4f26-48b6-8f76-b51bb9cbc4e8</dct:identifier>
  <!-- ... -->
</dcat:CatalogRecord>

How to create a profile?

Various implementations / profiles exist of DCAT-AP, each corresponding to, e.g., a specific use case or national implementation. This section illustrates the necessary steps in order to add such a profile to the plugin.

Profile identification

Using the conformsTo element in the CatalogRecord element, the profile is identified:

      <dcat:record>
         <dcat:CatalogRecord ...
            <dct:conformsTo>
               <dct:Standard rdf:about="https://data.vlaanderen.be/doc/applicatieprofiel/DCAT-AP-VL/erkendestandaard/2019-10-03">

When the profile is only adding a couple of elements to potentially one or more existing profiles (e.g. HVD adds the dcatap:hvdCategory element), the profile extension should declare those new elements and each of the target profiles needs to embed it into their editor views.

Schema

New profile elements need to be added to the schema XSD using cardinality 0..n. Cardinality 0..1 could be considered when, for a specific element, we know for a fact that GeoNetwork requires it for a correct functioning.

Cardinality is otherwise checked using schematron, whereas the XSD schema defines the elements and types (see validation).

Defining new elements in the XSD can done in the following ways:

<xs:element ref="dct:rightsHolder" minOccurs="0" maxOccurs="unbounded"/>
<xs:schema xmlns:dcatap="http://data.europa.eu/r5r/"
      ...
      <xs:import namespace="http://data.europa.eu/r5r/" schemaLocation="profiles/eu-dcat-ap-hvd.xsd"/>
      ...
      <xs:complexType name="Dataset_type">
      ...
                <!-- Profile / DCAT-AP-HVD -->
                <xs:element ref="dcatap:hvdCategory" minOccurs="0" maxOccurs="unbounded"/>

If the profile defines or uses new namespaces they need to be declared in:

Vocabularies

If some profile elements rely on vocabularies, add them to the thesauri folder using the SKOS format. Those vocabularies are imported when the application starts.

Register the vocabulary in src/main/plugin/dcat-ap/process/process-utility.xsl.

Q: Maybe this can be generic?

Editor configuration

If the profile requires a completely new editor, create a new view in the editor configuration.

See customizing editor for more information.

First, add a view (and use a condition to display it only if the profile is used in the respective record):

 <views
  displayIfRecord="count(//dcat:CatalogRecord/dct:conformsTo/dct:Standard[@rdf:about = 'https://data.vlaanderen.be/doc/applicatieprofiel/metadata-dcat/erkendestandaard/2021-04-22']) > 0"
>
  <view name="hvd-view">
    <tab id="hvd-tab"  default="true">
        ...

Q:

Add one or more tabs to the view:

    <tab id="hvd-tab"  default="true">

Make sure the tab id attribute is unique in the config-editor. A good practice is to use the same prefix for view id, tab id and section and field labels.

If the new element depends on a vocabulary, register the vocabulary:

<editor...
  <fields...
    <for name="dcatap:hvdCategory" use="thesaurus-list-picker">
      <directiveAttributes
        thesaurus="external.theme.high-value-dataset-category"
        xpath="/dcatap:hvdCategory"/>
    </for>

Then create the form:

    <section name="hvd-section">
      <field xpath="/rdf:RDF/dcat:Catalog/dcat:dataset/dcat:Dataset/dcatap:hvdCategory"/>

      <action type="add"
                   or="hvdCategory"
                   in="/rdf:RDF/dcat:Catalog/dcat:dataset/dcat:Dataset"
                   if="count(rdf:RDF/dcat:Catalog/dcat:dataset/dcat:Dataset/dcatap:hvdCategory) = 0">
        <template>
          <snippet>
            <dcatap:hvdCategory>
              <skos:Concept rdf:about="">
                <skos:prefLabel xml:lang=""></skos:prefLabel>
              </skos:Concept>
            </dcatap:hvdCategory>
          </snippet>
        </template>
      </action>
    </section>
  </tab>
</view>

Translations

Two types of translations have to be added:

Using a vocabulary for a field

To use a vocabulary for a particular field, configure it in the editor configuration top section

Q: check if this can be configured using only the editor view?

    <for name="mdcat:MAGDA-categorie" use="thesaurus-list-picker" profile="metadata-dcat">
      <directiveAttributes
        thesaurus="external.theme.magda-domain"
        xpath="/mdcat:MAGDA-categorie"/>
    </for>

If the new element uses a custom namespace, the namespace needs to be registered in src/main/plugin/dcat-ap/convert/thesaurus-transformation.xsl which converts a SKOS thesaurus to the profile schema.

Q: Record in a language not available in vocabulary? to test with https://github.com/geonetwork/core-geonetwork/pull/8268 which may help

Field with URI

If the element is an RDF resource URI ...

<dcat:endpointURL rdf:resource="https://www.marineregions.org/webservices.php"/>

... register the field using:

<for name="dcat:endpointURL" templateModeOnly="true" forceLabel="true" label="key">
  <template>
    <values>
      <key label="key" xpath="@rdf:resource" tooltip="dcat:endpointURL" required="true"/>
    </values>
    <snippet>
      <dcat:endpointURL rdf:resource="{{key}}"/>
    </snippet>
  </template>
</for>

Q: Other type of fields?

Validation

Validation results are displayed in the side panel of the editor form (see more information).

Validation is relying on 2 levels of validation:

XSD is checking elements and types. Cardinalities and profiles' rules are checked using schematron rules.

Schematron rules can be enabled/disabled depending on the profile. See configuring validation levels.

Q: Check how to enable/disable schematron rules for a profile with an example.

Validation is also taking care of checking the version of a profile as it does not always declare a new namespace for a new version.

Indexing

For elements that use a vocabulary and are configured in the editor configuration, indexing will automatically be done and can be used for search and aggregations.

    "th_high-value-dataset-category": [
      {
        "default": "Companies and company ownership",
        "langeng": "Companies and company ownership",
        "link": "http://data.europa.eu/bna/c_a9135398"
      }
    ],

For additional elements, create an additional indexing XSLT, import it into the main one and define custom indexing using the index-extra-fields mode:

<xsl:template mode="index-extra-fields"
              match="rdf:RDF[dcat:Catalog/dcat:record/dcat:CatalogRecord/dct:conformsTo/dct:Standard/@rdf:about='https://data.vlaanderen.be/doc/applicatieprofiel/DCAT-AP-VL/erkendestandaard/2019-10-03']">

  <xsl:call-template name="index-dcat-ap-vl-license"/>
  <xsl:apply-templates mode="index-dcat-ap-vl" select="descendant::dcat:DataService"/>
</xsl:template>

Templates

TODO

Interactions between profiles

Some limitations or improvements on the existing mechanism:

  1. Some profiles can be used in others, e.g., HVD can be combined with DCAT-AP or GeoDCAT-AP. How to reuse the editor configuration of a profile in another profile?
    • One view per profile may be limited, as we may have an "HVD" section in a DCAT-AP-VL form
  2. Combining all profiles will generate a large config-editor.xml
  3. Vocabularies are shared between profiles. How to manage them? Should we provide vocabularies in all EU languages when possible?
  4. Versioning. Is validation the step which validates the version of a profile?

Advanced configuration

Thumbnails

By default, thumbnails are encoded using foaf:page/foaf:Document.

If the profile requires a different encoding, modify:

Community

Comments and questions to the issue tracker.

More work required

This plugin would merit further improvements in at least the following areas:

Contributors

Acknowledgement

The work on this schema plugin was funded by and carried out in close collaboration with Digitaal Vlaanderen.