nationalarchives / kettle-jena-plugins

Jena Plugins for Pentaho Kettle to transform data into RDF
MIT License
12 stars 4 forks source link
jena pentaho-data-integration pentaho-kettle pentaho-spoon rdf

Jena Plugins for Pentaho KETTLE

CI Java 8 License Coverage Status

This project contains plugins for Pentaho Data Integration (or KETTLE as it is commonly known), that add functionality via Apache Jena for producing RDF.

The plugins provided are:

  1. Create Jena Model

    Create Jena Model Icon

    This transform plugin can be used to create a Jena Model for each row sent to it. Each Row becomes a Resource, and the plugin enables the mapping of fields to RDF Literals or Resources. The plugin includes support for constructing Blank Nodes within Resources.

  2. Combine Jena Models

    Combine Jena Models Icon

    This transform plugin allows you to merge multiple Jena Models that are within the same row into a single model. This can be considered as a horizontal transformation within a row.

  3. Group Merge Jena Models

    Group Merge Jena Models Icon

    This transform plugin performs a Group By operation across consecutive rows, allowing you to merge multiple Jena Models that are within consecutive rows into a single model in a single row. This can be considered as a vertical transformation across rows.

  4. Serialize Jena Model

    Serialize Jena Model Icon

    This output plugin takes the output of the Create Jena Model plugin, and serializes it to an RDF file on disk. Supports Turtle, N3, N-Triples, and RDF/XML output formats.

  5. SHACL Validation

    Jena SHACL Validation Icon

    This validation plugin supports validation of a Jena Model object created by the Create Jena Model plugin against a SHACL shape file loaded from the file system.

This project was developed by Evolved Binary and DeveXe as part of Project OMEGA for the National Archives.

Getting the Plugins

You can either download the plugins from our GitHub releases page: https://github.com/nationalarchives/kettle-jena-plugins/releases/, or you can build them from source.

Building from Source Code

The plugins can be built from Source code by installing the pre-requisites and following the steps described below.

Pre-requisites for building the project:

Build steps:

  1. Clone the Git repository

    $ git clone https://github.com/nationalarchives/kettle-jena-plugins.git
  2. Compile a package

    $ cd kettle-jena-plugins
    $ mvn clean package
  3. The plugins directory is then available at target/kettle-jena-plugins-1.0.0-SNAPSHOT-kettle-plugin/kettle-jena-plugins

Installing the plugins

You need to copy the plugins directory kettle-jena-plugins (from building above) into the plugins sub-directory of your KETTLE installation.

This can be done by either running:

  $ mvn -Pdeploy-pdi-local -Dpentaho-kettle.plugins.dir=/opt/data-integration/plugins antrun:run@deploy-to-pdi

or, you can do so manually, e.g.:

  $ cp -r target/kettle-jena-plugins-1.0.0-SNAPSHOT-kettle-plugin/kettle-jena-plugins /opt/data-integration/plugins/

Using the plugins

We wrote a short blog about working with the plugins: https://blog.adamretter.org.uk/rdf-plugins-for-pentaho-kettle/

We also created a small screencast demonstrating how to use the plugins in Pentaho Kettle. It's hosted on YouTube, click the image below to visit the video:

Watch the video