snakemake / snakemake

This is the development home of the workflow management system Snakemake. For general information, see
https://snakemake.github.io
MIT License
2.25k stars 547 forks source link

Silent failure when undeclared config value used in "run" block #1786

Open standage opened 2 years ago

standage commented 2 years ago

I recently spent a lot of time chasing down a bug in one of my Snakemake workflows, which ended up being the result of a typo: the key used in the Snakefile didn't match the key in the config file, in one case. This was in a run block instead of a shell block, and Snakemake didn't provide any hints as to the cause of the failure.

I've created a minimal example.

rule test:
    output: "message.txt"
    run:
        with open(output[0], "w") as fh:
            print(f"Hello, {config['message']}!", file=fh)

If I run Snakemake with message declared properly, the workflow runs without any issues.

$ snakemake --cores 1 --config message=world
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job      count    min threads    max threads
-----  -------  -------------  -------------
test         1              1              1
total        1              1              1

Select jobs to execute...

[Wed Jul 27 16:14:43 2022]
rule test:
    output: message.txt
    jobid: 0
    reason: Missing output files: message.txt
    resources: tmpdir=/var/folders/c_/5x4wpqxd73923hskscprwxg1j8qmlk/T

[Wed Jul 27 16:14:44 2022]
Finished job 0.
1 of 1 steps (100%) done
Complete log: .snakemake/log/2022-07-27T161443.660533.snakemake.log

But if I run Snakemake with an incomplete or improper config, it fails without any hints.

$ snakemake --cores 1 --config msg=world
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job      count    min threads    max threads
-----  -------  -------------  -------------
test         1              1              1
total        1              1              1

Select jobs to execute...

[Wed Jul 27 16:14:17 2022]
rule test:
    output: message.txt
    jobid: 0
    reason: Missing output files: message.txt
    resources: tmpdir=/var/folders/c_/5x4wpqxd73923hskscprwxg1j8qmlk/T

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-07-27T161416.838177.snakemake.log

Without knowing much about Snakemake's internal implementation, I'm guessing blocks of Python code aren't inspected much by Snakemake except for syntax checks. So any runtime errors aren't going to be discovered until the Python interpreter attempts to execute the code. Is there no straightforward way to catch and report e.g. IndexErrors when Python code attempts to access an undeclared config value?

As an aside, I found that by assigning the config value to a key in the params block, Snakemake was able to catch and report the issue.

dariober commented 2 years ago

Duplicate of https://github.com/snakemake/snakemake/issues/1698?

As an aside, I found that by assigning the config value to a key in the params block, Snakemake was able to catch and report the issue.

This should be because the dag is evaluated before running the run, script, and shell directives.

standage commented 2 years ago

Duplicate of #1698?

Yes!