snakemake / snakemake-executor-plugin-googlebatch

Snakemake executor plugin for Google Batch (under development)
MIT License
3 stars 5 forks source link

Issues with include #39

Closed cademirch closed 3 months ago

cademirch commented 3 months ago

Seems like there is a problem with finding rules that are specified with the include directive in the main Snakefile. Using the hello_world example from this repo.

Snakefile:

include: "rules/hello.smk"

# By convention, the first pseudorule should be called "all"
# We're using the expand() function to create multiple targets
rule all:
    input:
        expand(
            "{greeting}/world.txt",
            greeting=["hello", "hola"],
        ),

hello.smk:

rule multilingual_hello_world:
    output:
        "{greeting}/world.txt",
    shell:
        """
        mkdir -p "{wildcards.greeting}"
        sleep 5
        echo "{wildcards.greeting}, World!" > {output}
        """

And directory structure like so:

.
└── workflow
    ├── Snakefile
    └── rules
        └── hello.smk

Executing snakemake from the root (i.e same level as workflow), the batch jobs fail complaining they can't find "/rules/hello.smk". But, if you execute from inside workflow, the batch jobs work.

cademirch commented 3 months ago

Looks like the issue is in here https://github.com/snakemake/snakemake-executor-plugin-googlebatch/blob/cacc9fec50903acf69d8d7cfa6339809e3b97b41/snakemake_executor_plugin_googlebatch/executor.py#L396 This should probably return the path to the Snakefile, not just "Snakefile"

vsoch commented 3 months ago

The reason it's like that is because the entire working directory (local or remote storage) should be staged by snakemake before it hits the executor. We only need the Snakefile to kick off work. I still think this issue belongs with the storage library but maybe @johanneskoester can chime in.

cademirch commented 3 months ago

Hmm I see. It looks like the command executed in the batch job is python -m snakemake --snakefile Snakefile ... which is what I presume is causing the issue. In the logs I also see: WorkflowError in file /Snakefile, line 1: and Failed to open source file /rules/hello.smk which looks like Snakemake is executing the Snakefile in the root dir.

Even if the working directory was staged, I think snakemake --snakefile Snakefile wouldn't work because the snakefile is at workflow/Snakefile?

cademirch commented 3 months ago

Making get_snakefile return "workflow/Snakefile" seems to get this to run. Presumably this wouldn't work when the snakefile is something other than "workflow/Snakefile".

Could probably do something like return Path(self.main_snakefile).relative_to(Path().cwd()). Unless there is a way to get the relative path from workflow. Not sure the implications though.

vsoch commented 3 months ago

If you have a suggested fix, would be great to get a PR! This code likely is legacy from my kueue executor, where we upload the Snakefile as a config map and then it's always in the same location.

cademirch commented 3 months ago

Sounds good I'll open a PR!