This repository is part of the QuAnGIS project. It encompasses the definition of the core concept transformation algebra, and a database of GIS tools specified in terms of it. It also contains recipes to produce and transform workflows using APE and annotate these tools.
The recipes are specified using doit in the
dodo.py
file. Run doit list
to see an overview of what
you can create.
You will need Python 3.9 as well as the dependencies specified in
requirements.txt
. To generate workflows, Java 1.8+
must be installed; a compatible version of APE will automatically be
downloaded. Furthermore, to fire queries at a SPARQL endpoint, you will
need to have an external SPARQL endpoint running. We will assume that
you are running an Apache Fuseki server at
localhost:3030
. Other options are
BlazeGraph or the proprietary
MarkLogic.
You can install everything you need within a Miniconda environment. First, create it and install the necessary packages:
conda create -n quangis-wf
conda activate quangis-wf
conda install python=3.9 spacy spacy-model-en_core_web_sm pyzmq git jpype1 doit tomlkit graphviz pydot platformdirs
conda install -c conda-forge openjdk=21
pip install antlr4-python3-runtime==4.9.3 word2number --editable=git+https://github.com/quangis/transforge.git@develop#egg=transforge
Then get the repositories you need:
git clone https://github.com/quangis/geo-question-parser
git clone https://github.com/quangis/quangis-workflow
You will also need to download and run Fuseki. To do this, run the
following commands. Afterwards, open your browser at
http://localhost:3030
to create the ´cct´ triple store.
curl -fLo fuseki.zip https://dlcdn.apache.org/jena/binaries/apache-jena-fuseki-4.10.0.zip
unzip fuseki.zip
cd apache-jena-fuseki-4.10.0
java -jar fuseki-server.jar
Finally, while Fuseki is running, open another Miniconda prompt in the
´quangis-wf´ environment to be able to run the doit
recipes:
conda activate quangis-wf
cd quangis-workflow
doit list
The question-parser
repository handles parsing of geo-analytical questions. Since this
hasn't yet been turned into a module, we simply assume that the local
repository exists at ../geo-question-parser
relative to the root of
this repository. Assuming that, and that all prerequisites for that
repository have been taken care of, we can parse the questions and
produce transformation graphs for them with:
doit question_transformation
We distinguish two types of workflow specification: concrete and abstract. Concrete workflows consist of concrete tools, as implemented in ArcGIS or QGIS. Abstract workflows consist of abstract tools that may be implemented by one or more concrete tools --- and which may, in turn, then implement multiple abstract tools. Only abstract tools can be associated with a CCT transformation expression (see below).
The directory data/workflows/expert1/
contains the abstract expert workflows which were used for our initial
evaluations. These use only the manually described tools in
data/tools/
.
The workflow annotation repository contains concrete expert
workflows, which have been copied into
data/workflows/expert2/
. These workflows
have been annotated in such a way that abstract tools can be extracted
from them. The updated tool repository[^1] can be produced at
build/tools/
via:
doit toolset_update
[^1]: Currently, there are quite a few issues with workflow abstraction, leading to both duplication and needless differentiation. The warnings generated by the process should give some pointers.
Assuming that the tool database is up-to-date, the concrete workflows
can then be converted into abstract workflows at
build/workflows/expert2/
:
doit wf_expert2
Additional abstract workflows can be synthesized from the abstract tool repository using the Automated Pipeline Explorer.
The inputs and outputs of each abstract tool in the tool repository are
annotated with core concept datatypes according to the CCD
ontology. This is translated to a format that APE understands. APE is
then instructed to generate workflows for different possible
input/output data configurations, as specified in
data/ioconfig.ttl
. To perform this step and
obtain workflows at build/workflows/generated/
, run:
doit wf_gen
An alternative approach is to generate variants of workflows based on the input/output types found in expert workflows that we already have. This experiment is not fully fleshed-out, but can be run with:
doit wf_gen_variants
A final approach is to generate workflows by translating the CCT types (found in the transformation graphs that were generated for questions) into CCD types, and using those CCD types to constrain APE's workflows:
doit wf_gen_question
Abstract tools are annotated with a description of their functionality
by means of a CCT expression. This expresses the conceptual steps they
perform while abstracting away from implementation details. The types
and operators of the CCT transformation algebra are defined in the
quangis/cct.py
module.
This information is then weaved into a graph of conceptual
transformations via the transforge
library, which was developed
for this purpose. To make transformation graphs for any workflow, just
run:
doit transformations
For the workflows generated based on questions, the transformation graphs are currently immediately generated alongside the workflows themselves:
doit wf_gen_question
Any workflow with transformation graph in the build/transformations/
directory can be visualized in the GraphViz DOT format:
doit viz_dot
Which can then be converted to PDF if GraphViz is installed:
doit viz_pdf
Following the above recipes, the build/workflows/expert1/
directory
should contain workflow transformation graphs for workflows that answer
particular questions. These questions, in turn, correspond to tasks that
are encoded as query transformation graphs in the
data/tasks/
directory.
To reproduce the matching between workflow transformation graphs and
these query transformation graphs, set up your triple store, change the
STORE_*
variables to the appropriate values, then run the following:
doit tdb_upload
Then, send your queries:
doit tdb_query_expert1
In the build/eval/
directory, CSV files will be produced that show
which workflows are retrieved for which task descriptions, for all
evaluation variants used in the JOSIS paper.
To evaluate the workflow, you first want to generate and upload the associated question-based workflow transformation graphs. Then, the associated question transformation graphs can be fired using:
doit tdb_query_questions
Afterward, to assemble those workflows that were generated for a question and also match its transformation graph:
doit tdb_query_questions_intersection
To run (rather limited) tests, and sanity checks
doit test