quangis / quangis-workflow

Tools to describe GIS workflows semantically, and to generate them. Includes the core concept transformation algebra (CCT).
GNU General Public License v3.0
1 stars 0 forks source link

QuAnGIS workflows

This repository is part of the QuAnGIS project. It encompasses the definition of the core concept transformation algebra, and a database of GIS tools specified in terms of it. It also contains recipes to produce and transform workflows using APE and annotate these tools.

The recipes are specified using doit in the dodo.py file. Run doit list to see an overview of what you can create.

Installation

You will need Python 3.9 as well as the dependencies specified in requirements.txt. To generate workflows, Java 1.8+ must be installed; a compatible version of APE will automatically be downloaded. Furthermore, to fire queries at a SPARQL endpoint, you will need to have an external SPARQL endpoint running. We will assume that you are running an Apache Fuseki server at localhost:3030. Other options are BlazeGraph or the proprietary MarkLogic.

You can install everything you need within a Miniconda environment. First, create it and install the necessary packages:

conda create -n quangis-wf
conda activate quangis-wf
conda install python=3.9 spacy spacy-model-en_core_web_sm pyzmq git jpype1 doit tomlkit graphviz pydot platformdirs
conda install -c conda-forge openjdk=21
pip install antlr4-python3-runtime==4.9.3 word2number --editable=git+https://github.com/quangis/transforge.git@develop#egg=transforge

Then get the repositories you need:

git clone https://github.com/quangis/geo-question-parser
git clone https://github.com/quangis/quangis-workflow

You will also need to download and run Fuseki. To do this, run the following commands. Afterwards, open your browser at http://localhost:3030 to create the ´cct´ triple store.

curl -fLo fuseki.zip https://dlcdn.apache.org/jena/binaries/apache-jena-fuseki-4.10.0.zip
unzip fuseki.zip
cd apache-jena-fuseki-4.10.0
java -jar fuseki-server.jar

Finally, while Fuseki is running, open another Miniconda prompt in the ´quangis-wf´ environment to be able to run the doit recipes:

conda activate quangis-wf
cd quangis-workflow
doit list

Question transformation graphs

The question-parser repository handles parsing of geo-analytical questions. Since this hasn't yet been turned into a module, we simply assume that the local repository exists at ../geo-question-parser relative to the root of this repository. Assuming that, and that all prerequisites for that repository have been taken care of, we can parse the questions and produce transformation graphs for them with:

doit question_transformation

Expert workflows

We distinguish two types of workflow specification: concrete and abstract. Concrete workflows consist of concrete tools, as implemented in ArcGIS or QGIS. Abstract workflows consist of abstract tools that may be implemented by one or more concrete tools --- and which may, in turn, then implement multiple abstract tools. Only abstract tools can be associated with a CCT transformation expression (see below).

Expert1

The directory data/workflows/expert1/ contains the abstract expert workflows which were used for our initial evaluations. These use only the manually described tools in data/tools/.

Expert2

The workflow annotation repository contains concrete expert workflows, which have been copied into data/workflows/expert2/. These workflows have been annotated in such a way that abstract tools can be extracted from them. The updated tool repository[^1] can be produced at build/tools/ via:

doit toolset_update

[^1]: Currently, there are quite a few issues with workflow abstraction, leading to both duplication and needless differentiation. The warnings generated by the process should give some pointers.

Assuming that the tool database is up-to-date, the concrete workflows can then be converted into abstract workflows at build/workflows/expert2/:

doit wf_expert2

Generated workflows

Additional abstract workflows can be synthesized from the abstract tool repository using the Automated Pipeline Explorer.

From input/output specifications

The inputs and outputs of each abstract tool in the tool repository are annotated with core concept datatypes according to the CCD ontology. This is translated to a format that APE understands. APE is then instructed to generate workflows for different possible input/output data configurations, as specified in data/ioconfig.ttl. To perform this step and obtain workflows at build/workflows/generated/, run:

doit wf_gen

Workflow variants

An alternative approach is to generate variants of workflows based on the input/output types found in expert workflows that we already have. This experiment is not fully fleshed-out, but can be run with:

doit wf_gen_variants

From questions

A final approach is to generate workflows by translating the CCT types (found in the transformation graphs that were generated for questions) into CCD types, and using those CCD types to constrain APE's workflows:

doit wf_gen_question

Workflow transformation graphs

Abstract tools are annotated with a description of their functionality by means of a CCT expression. This expresses the conceptual steps they perform while abstracting away from implementation details. The types and operators of the CCT transformation algebra are defined in the quangis/cct.py module.

This information is then weaved into a graph of conceptual transformations via the transforge library, which was developed for this purpose. To make transformation graphs for any workflow, just run:

doit transformations

For the workflows generated based on questions, the transformation graphs are currently immediately generated alongside the workflows themselves:

doit wf_gen_question

Visualization

Any workflow with transformation graph in the build/transformations/ directory can be visualized in the GraphViz DOT format:

doit viz_dot

Which can then be converted to PDF if GraphViz is installed:

doit viz_pdf

Evaluation

Evaluation on expert workflows

Following the above recipes, the build/workflows/expert1/ directory should contain workflow transformation graphs for workflows that answer particular questions. These questions, in turn, correspond to tasks that are encoded as query transformation graphs in the data/tasks/ directory.

To reproduce the matching between workflow transformation graphs and these query transformation graphs, set up your triple store, change the STORE_* variables to the appropriate values, then run the following:

doit tdb_upload

Then, send your queries:

doit tdb_query_expert1

In the build/eval/ directory, CSV files will be produced that show which workflows are retrieved for which task descriptions, for all evaluation variants used in the JOSIS paper.

Workflow generation evaluation

To evaluate the workflow, you first want to generate and upload the associated question-based workflow transformation graphs. Then, the associated question transformation graphs can be fired using:

doit tdb_query_questions

Afterward, to assemble those workflows that were generated for a question and also match its transformation graph:

doit tdb_query_questions_intersection

Tests

To run (rather limited) tests, and sanity checks

doit test