mara / mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
MIT License
2.06k stars 102 forks source link

Support different execution contexts #68

Open leo-schick opened 2 years ago

leo-schick commented 2 years ago

Currently mara pipelines are always executed locally. But I would like to have an option to execute it sometimes somewhere else e.g. in another environment where other ressources are closer available.

The idea

So I came up with the idea about execution contexts. Here is the rough idea:

  1. one can define a execution context for a pipeline or for a specific task
  2. the exection context then defines where the shell command shall be executed
  3. it should be possible to define multiple execution contexts within one pipeline
  4. a execution context has a "enter" / "exit" method which gives the option to spin up or release the required resources for the execution context

The current idea is to support the following execution context:

  1. BashExecutionContext - local bash (this is the current default behavior)
  2. SshBashExecutionContext - remote bash execution via ssh
  3. DockerExecutionContext - docker exec with optional start/stop of a container

Possible other options (Out of scope)

This concept could be extended in the future to add other options like:

These ideas are just noted here and are out of scope for this issue.

Blueprint for the ExecutionContext base class

class ExecutionContext:
    """The execution context for a shell command"""
    self.is_active: bool = false

    def __enter__(self):
        """Enters the execution context."""
        return self

    def __exit__(self, type, value, traceback) -> bool:
        """Exits the execution context freeing up used resources."""
        return True

    def run_shell_command(self, shell_command: str) -> bool:
        """Executes a shell command in the context"""
leo-schick commented 2 years ago

@jankatins @gathineou @martin-loetzsch would like to get your feedback here what you guys think about this idea

I know about the option mara_pipelines.config.bash_command_string but this wasn't enogh for me because I need to be able to execute multiple execution contexts on the same server and do not want to use multiple mara config files.

leo-schick commented 1 year ago

It would be nice to have a SQL execution context as well. This context would then run e.g. the ExecuteSQL command via the python DB API (see The development would require mor refactoring than the current implementation which just patches the way batch commands are executed.