A generic solution for carbon abatement/sequestration analytics

The following business use case promotes a generic toolset that would be used by analysts working on carbon abatement/sequestration projects:

User Story

Working as an analyst, I'm required produce analytics to inform decisions to enable carbon abatement/sequestration projects.

The analysis includes dimensional views across linked data to support the ability to optimise strategies and tactics for projects.

My work includes linking across multiple datasets including linking to provenance information supporting the ability to trace information back to its source, including visibility over the capture of information linked to the analysis, where this provenance data explains the derivations, transformations and data acquisition activities applied in producing reports.

To support my analysis, I require that all data (and associated metadata) can be referenced via RDF-based graphs and where it is practicable, the data need not be ingested in the graphs but instead accessed dynamically (remotely) via service interfaces that provide access to the data at source (at the "system of record").

This work requires approaches to automation for the creation/acquisition of the ontology, taxonomy and vocabularies from structured and unstructured data sources.

Where the data needs to conform to standards set in policy, I require those standards to be captured in the graph thus to enable validation of data and evidence conformance to standards.

My available time and resources to automate meta/data acquisition is very limited so I require this automation to be available and flexible enough to enable the use of new data types for analysis.

Where in the linking of data requires transformation of unstructured data into structured data, automation that enables consistent and repeatable approaches to transformation via modular workflow automation is required. Workflows deliver the data uplift resulting structured data to RDF for referencing via an indexing component (querriable graph).

Deployment of RDF data to a graph is automated to the extent possible and is available as a repeatable process.

Preconditions

Access to metadata that specifies structure and relationships of structured information inputs for the analysis.
Access to metadata that specifies the provenance of data used for analysis.
The same two preconditions (as above) for unstructured data to support uplifting the data into datasets for referencing via the indexing applied.
Available capability for automated procedure to manage leaving the data in-situ (not copying nor moving from system of record into a graph store) or alternatively where service access to data is not available, data acquisition is an automated capability in a way that minimises my need to configure meta/data capture automation.
Where data are not currently available as structured data, capability to extract the data as structured data is available, and workflow automation is provisioned to support repeatable data processing, to the maximum extent possible.

Postconditions

Indexed data and associated metadata enables graph queries and analytics across a RDF-based graphs.
Analysis capability enables querying of carbon abatement/sequestration linked dimensions.
Provenance of data is linked to the data and is querriable/reportable.
Reports generated from my analysis can be recreated with the same results even where new/changed data has been ingested/indexed.
Delivery dates for analytics have been met.

opengeospatial / GeoDCAT-SWG