Define interaction with workflow composition (T6.1)

Goal

Define how to handle "triggers" received by the workflow composition tool, namely the orchestrator in the DTE core. This is infrastructure dependent.

Alternatives

openEO's UDF (+ REST API?)
HTTP hook
Message queue
Other?

Background

Containers

After analysing requirements from use cases (WP4), DTE core (WP6), and infrastructure (WP5), it is likely that the "common language" is containers (e.g., Docker, Singularity). Infrastructure may not work at workflow/"pipeline" level, thus this abstraction would be implemented by T6.1. Namely, T6.1 is going to:

Break workflows into steps, and deploy the steps on the architecture as containers
Listen for events (e.g., new data is available: dCache + Nifi + OSCAR), and trigger the corresponding workflow (e.g., new training data is available, then trigger ML training).
Run a workflow step-by-step: once a step (container) has completed, trigger the next one using Kubernetes-like APIs.

Sub-workflows

Different workflow steps may be run using different (sub-)workflow engines. For instance:

Big data pre-processing of satellite images shall be carried out using openEO workflows (e.g., using Spark would be sub-optimal).
AI/ML could be carried out by using Kubeflow or Elyra workflows, which are AI-centric.
And so on...

As a result, in the same DT workflow we may use multiple workflow engines, which are tailored/optimized for the needs of each workflow step. The goal is to give freedom to individual tasks to use their preferred workflow engine, which is often the more optimized one for that task.

Conceptually, T6.1 is developing a workflow manager, which is working super partes, thus implementing a "Super orchestrator/workflow manager". This high-level orchestrator is agnostic from the operation executed in each node. It may work with common workflow language.

Below, an example of a toy workflow concerning the prediction of fire risk maps on satellite images, concerning the following steps:

Big data pre-processing: transform training and inference (i.e., unseen) satellite images, preparing them for ML workflows. In principle, both training and inference images may be pre-processed in the same way.
GAN neural network training: train a ML model to predict fire risk maps, and save it to the models registry.
Data fusion and visualization: apply the trained ML model on unseen satellite images and show fire risk predictions to the user (visualization component).

In this case:

the "super orchestrator" may be developed by T6.1
"Big data pre-processing" may be developed by T6.4
"GAN neural net training" may be developed by T6.5
"Data fusion and visualization" may be developed by T6.3

NOTE: In practice, the "super orchestrator" could be implemented by re-using one of the engines required by some task, with reduced maintenance cost. However, it has to support for general-purpose workflow, which steps are deployed as containers.

interTwin-eu / itwinai