mara / mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
MIT License
2.06k stars 102 forks source link

Support different execution contexts #68

Open leo-schick opened 2 years ago

leo-schick commented 2 years ago

Currently mara pipelines are always executed locally. But I would like to have an option to execute it sometimes somewhere else e.g. in another environment where other ressources are closer available.

The idea

So I came up with the idea about execution contexts. Here is the rough idea:

  1. one can define a execution context for a pipeline or for a specific task
  2. the exection context then defines where the shell command shall be executed
  3. it should be possible to define multiple execution contexts within one pipeline
  4. a execution context has a "enter" / "exit" method which gives the option to spin up or release the required resources for the execution context

The current idea is to support the following execution context:

  1. BashExecutionContext - local bash (this is the current default behavior)
  2. SshBashExecutionContext - remote bash execution via ssh
  3. DockerExecutionContext - docker exec with optional start/stop of a container

Possible other options (Out of scope)

This concept could be extended in the future to add other options like:

These ideas are just noted here and are out of scope for this issue.

Blueprint for the ExecutionContext base class

class ExecutionContext:
    """The execution context for a shell command"""
    self.is_active: bool = false

    def __enter__(self):
        """Enters the execution context."""
        return self

    def __exit__(self, type, value, traceback) -> bool:
        """Exits the execution context freeing up used resources."""
        return True

    def run_shell_command(self, shell_command: str) -> bool:
        """Executes a shell command in the context"""
        pass
leo-schick commented 2 years ago

@jankatins @gathineou @martin-loetzsch would like to get your feedback here what you guys think about this idea

I know about the option mara_pipelines.config.bash_command_string but this wasn't enogh for me because I need to be able to execute multiple execution contexts on the same server and do not want to use multiple mara config files.

leo-schick commented 1 year ago

It would be nice to have a SQL execution context as well. This context would then run e.g. the ExecuteSQL command via the python DB API (see https://github.com/mara/mara-db/pull/71). The development would require mor refactoring than the current implementation which just patches the way batch commands are executed.