oamg / leapp

Leapp - Application & OS Modernization Framework (For in-place upgrades, follow leapp-repository)
https://oamg.github.io/leapp/
Apache License 2.0
86 stars 69 forks source link

Proposal: modular, dataflow-based architecture for the LeApp CLI #154

Closed pcahyna closed 6 years ago

pcahyna commented 7 years ago

Synopsis

It is proposed to create a more modular architecture for the LeApp CLI, using dataflow-based programming concepts and frameworks. Modules in this architecture are called Actors. This infrastructure is also intended to serve as the next generation of the preupgrade-assistant tool, whose current modular architecture is too restricted. Actors will specify what types of data they require and provide on their input and output ports respectively, and a dependency solver will produce a dependency graph to satisfy those requirements, i.e. connect output ports to appropriate input ports. We considered two frameworks for flow-based programming: WOW:-P and Selinon. The former has the advantage of being simpler and apparently better expressing the dependency information that we need. The advantage of Selinon on the other hand is that it is designed to cope with non-idempotent actors. The actual implementation of actors will need to be possible to be done in shell, with a Python wrapper to present the proper interface for the (Python-based) framework. It is furthermore proposed to execute those shell parts using Ansible (as Ansible modules) and let them communicate using JSON, as it is the standard for Ansible modules.

Goals

For the initial implementation we chose WOW:-P. Its advantages over Selinon are:

It may turn out that in the future we will need Selinon features in which case it may be preferable to switch to it. In particular, we may need it if we want to allow non-idempotent actors. In a simple implementation of the workflow, if its execution is aborted due to a failure of an actor, the whole process must be restarted. This is not acceptable in the case when some of the actors are non-idempotent. In Selinon, tasks which have been completed in a previous run can be skipped: http://selinon.readthedocs.io/en/latest/selective.html#reuse-results-in-selective-flows It is thus a matter of decision whether we will forbid non-idempotent actors to simplify the framework, or allow them in order to potentially simplify some of the actors. In the latter case Selinon will be probably the easiest solution and a layer which produces Selinon dependencies from data dependencies will need to be implemented. pyutilib.workflow seems to be very similar to WOW:-P and choosing between them is a matter of taste. So far we choose WOW:-P because @pcahyna is in contact with its developers from previous job. None of the infrastructures evaluated has implicit data dependencies. In WOW:-P and pyutilib.workflow one has to connect explicitely the source port of one actor to target port in another actor to pass data between them and establish the dependency. It is therefore needed to write a dependency resolution engine which converts implicit dependencies to explicit connections between actors. The usefulness of the infrastructure will in a large part depend on how well will it be possible to express those dependencies. In the initial design, the output ports of actors have an "annotation" property which contains the data type that they provide. The input ports also have an "annotation" property which contains the data type that they require and optionally the name of the source actor that they require it from. In the dependency resolution step, the resolver connects input ports to output ports whose provided data type is a subtype of their required data type and whose actor's name correspond to their required one (if any). At the end of the resolution phase, if an input port is connected to more than one output port, it is considered an error. As an exception, there may be input ports marked as accepting multiple inputs. In this case the framework will merge all the inputs into a list and this port will obtain this list. This is important for preupgrade-assistant, which ultimately produces a report from all the actors. Therefore, each actor shall have an output port which provides its status to the report-generating actor. The report generating actor shall have an input port which accepts the "status" data type from any actor and accepts multiple inputs. It will then obtain the statuses from all the actors and produce a report from them. The actors in WOW:-P (and other Python-based workflow infrastructures) are just Python objects. As we want to be able to write actors as shell scripts, we will need a Python class which will wrap such a script in a proper Python actor. We will then need a way to get output data (for output ports) from the shell script, which is a bit difficult, as shell script have poor possibilities of providing outputs - only stdout, stderr and possibly temporary files. In Ansible, modules can provide a set of structured data, called facts. Moreover, an Ansible module is just a script written in any language, including shell, which Ansible executes on the target system. It therefore has all the properties which we need from our script which implements an actor. Therefore, our actors should look like a Python wrapper class (to provide the proper interface required from an actor in the chosen workflow infrastructure such as WOW:-P) which internally calls Ansible, which executes a module, which can be a shell script. The wrapper class in Python should be the same for most, if not all, actors. Those who will implement the actors will thus only need to provide the Ansible module (shell script). The output (facts) from an Ansible module is done using JSON. We thus shall restrict the data passed between actors to what can be easily represented by JSON, although at least WOWP:-P can pass any Python objects. There will probably need to be helper shell functions for the JSON output formatting and other tasks common to all the actors' scripts.

ncoghlan commented 7 years ago

This general approach sounds reasonable to me, so I think it makes sense to start moving in this direction.