executor: support python code execution via `run` hint

mvidalgarcia commented 3 years ago

Current behavior

Apart from shell commands, Snakemake also permits running python code directly via run:. At the moment, we're identifying such situations with the job.is_run flag but we're aborting the execution.

Expected behavior

As the use of run: is a common practice in the Snakemake world and LHCb also uses this extensively, we need to support it.

At the moment, the Job-Controller always uses bash -c as command to run the shell args as we've never supported anything else but shell commands. We need to identify when run: is used, let the Job-Controller know this via kwarg or similar and pass python -c as command instead.

Snakefile example:

rule all:
    input:
        "results/greetings.txt"

rule helloworld:
    params:
        name=config["name"]
    output:
        "results/greetings.txt"
    container:
        "docker://3.9-slim-buster"
    run:
        with open(output[0], "w") as out:
            out.write(f"Hello {params['name']}!")

mvidalgarcia commented 3 years ago

A problem arose while working on this issue. Snakemake doesn't support the use of run (Python execution) and container (singularity) directives at the same time.

RuleException in line 27 of ~/code/reanahub/reana-demo-helloworld/workflow/snakemake/Snakefile-run:                              
Singularity directive is only allowed with shell, script, notebook or wrapper directives (not with run).                                    
  File "~/code/reanahub/reana-demo-helloworld/workflow/snakemake/Snakefile-run", line 27, in <module>

Here is the code that performs the check.

As it can be seen, it considers a rule with container as invalid when it doesn't have shell, script, notebook or wrapper directives.

This is a problem for the REANA integration as it already fails at validation time (in reana-client).

Some possible solutions to this problem might be:

Use a different "free text" directive to set the container images in the Snakefile, (e.g. resources.reana_environment) and read it in the workflow engine to later pass it to the job controller. Cons: The snakemake workflow wouldn't be compatible with Singularity execution as no container would be set following the Snakemake specification syntax.
Try to monkey patch the decorate function to bypass this check. Cons: a) function is too long to monkey patch it. b) monkeypatch would be needed both in r-client and in r-w-e-snakemake (or maybe in r-commons would be enough) c) it would be against the snakefile spec. i.e. there must be a reason why they don't allow to user container and run together.
As Snakemake doesn't permit the use of these two directives at the same time i.e. it's not possible/recommended to run Python code directly in containerized environments (singularity), follow the same guidelines ourselves, and recommend users to use only shell directives and wrap their code into python scripts, which is cleaner and more portable.

@tiborsimko @audrium Do you have any other solution/option in mind? I would like to hear your opinion.

audrium commented 3 years ago

@tiborsimko @audrium Do you have any other solution/option in mind? I would like to hear your opinion.

In my opinion the cleanest solution would be to recommend users to use shell directive instead of a run one (Solution #3). It's interesting that the reason it's not possible to run Python directly in containers is that the run directive has access to the rest of the Snakefile (e.g. globally defined variables) and therefore must be executed in the same process as Snakemake itself. (copied form the docs). So I guess there is no way to workaround this since there might be some workflows which would not work anyway.. That's why I think we should enforce the use of shell instead of a run to make the workflows reproducible

tiborsimko commented 3 years ago

(1) For the MVP release, we can definitely say that run directive is not supported yet.

(2) However, we shall need to think of supporting it soon. Looking at one LHCb example, I see that the workflow is using 12 run directives and 48 shell directives:

$ cd XiStarCC
$ rg 'run\s?*:' | wc -l
12
$ rg 'shell\s?*:' | wc -l
48

so the usage of run is quite considerable.

We could check with @admorris @seneubert et al to see how widespread the use of run in LHCb is, but I guess it will be rather widespread. For example, even the LHCb Starterkit advertises run directive during the training already.

(3) I guess the main reason why Snakemake cannot do run directive in containers is simply that it is not guaranteed that the container would contain the Snakemake version suitable for the run. Either the container does not have any Snakemake embedded, or it could have an incompatible version, etc. So it's safer to say it is not supported. For example:

    container:
        "docker://reanahub/reana-env-root6:6.18.04"
    run:
        with open(output[0], "w") as out:
            out.write(f"Hello {params['name']}!")

would not work because our container reanahub/reana-env-root6 does not have any Snakemake embedded inside.

However, if we know that our physics containers have the same Snakemake version embedded in the image, then perhaps it might work?

Could you make a quick test to (i) add Snakemake to our root container, and (ii) alter the Snakemake sources to switch off the container run check, and (iii) run the example? Perhaps it will work?

(4) If the previous would work, then great, we could simply document this and make sure that LHCb workflows would run that way, i.e. that LHCb images contain the necessary Snakemake libraries.

(5) If it won't work, then we'd have to think about other solutions. One possibility could be to execute run statements inside the workflow engine pod directly. This could work if people use run for file manipulations, say, whilst all the heavy job lifting is done within shell directives. This could take care of some groups of use cases, but it is obviously not general enough.

Something we could look at later together with @admorris @seneubert at how LHCb actually uses run directive in their typical analyses, unless the above "always-wrap-snakemake-library-into-your-images-if-you-want-to-use-snakemake-run-directive" technique would perhaps work already by itself?

mvidalgarcia commented 3 years ago

I guess the main reason why Snakemake cannot do run directive in containers is simply that it is not guaranteed that the container would contain the Snakemake version suitable for the run.

I don't think that's the main reason, actually, why would one need Snakemake installed in the containers? In the end, the idea is to run python code instead of a shell script, we don't run any snakemake command inside the job containers (that I know..)

The main reason looks like it is what @audrium mentioned here.

Could you make a quick test to (i) add Snakemake to our root container, and (ii) alter the Snakemake sources to switch off the container run check, and (iii) run the example? Perhaps it will work?

(i) That shouldn't have any impact, we just forward the commands to the job-controller. Unless I'm missing something.
(ii) As I pointed out earlier (2), IMHO it's not really feasible to monkey-patch a 231-line function only to disable a flag that might cause potential issues.

tiborsimko commented 3 years ago

why would one need Snakemake installed in the containers?

To access rule inputs and other configuration variables etc defined elsewhere in Snakefiles. (Unless we would expand those fully before passing them on for execution.)

The main reason looks like it is what @audrium mentioned here.

That's pretty much similar to what I was thinking, accessing variables living outside of the rule. If Snakefiles are the same in the two processes, and no state is shared within Snakemake process memory itself, but via workspace and .snakemake directory, then it might perhaps work?

(i) That shouldn't have any impact, we just forward the commands to the job-controller.

... and if the job has Snakemake installed, and can access all the Snakefiles, perhaps it could execute the given run rule?

Anyway, just some wild guesses to perhaps look at...

If it does not work, then executing the whole workflow within the same container might be the only way, at the price of using only parallel threads instead of parallel jobs for those workflows...

mvidalgarcia commented 3 years ago

(Unless we would expand those fully before passing them on for execution.)

AFAIK, everything is expanded before scheduling the actual jobs. ATM our demo examples work fine even though they have params coming from a YAML file. Snakemake takes care, on the workflow-engine side, of expanding those before submitting the job. The job containers are completely agnostic, they know nothing about snakemake.

... and if the job has Snakemake installed, and can access all the Snakefiles, perhaps it could execute the given run rule?

That sounds way too convoluted to me. We're passing the shell command, environment, operational options, etc. to the job-controller API to spawn a k8s job with that metadata. I don't see how a predefined container image could first install snakemake dependency, then parse the Snakefile, access the rule that is supposed to run, do the appropriate variable expansions, etc.. It sounds like reinventing the wheel a bit to me. Perhaps I'm misinterpreting your explanations.

FWIW, the original idea was passing python -c "<run_content>" to the job controller, instead of bash -c .... We can try that by modifying the original snakemake codebase to bypass the flag check, and check how variable expansion, shell() calls and so on are computed when scheduling the jobs, even though monkey-patching that function doesn't sound like a viable solution...

reanahub / reana-workflow-engine-snakemake

executor: support python code execution via `run` hint #12