snakemake / snakemake

This is the development home of the workflow management system Snakemake. For general information, see
https://snakemake.github.io
MIT License
2.17k stars 521 forks source link

Module conflicts in workflows with a diamond-shape file structure #2894

Open mramospe opened 1 month ago

mramospe commented 1 month ago

Snakemake version

8.11.6 7.32.4 main (74627d3f07f549600dc8ad226a545f44191141e0)

Describe the bug

When defining a workflow with files imported as modules in a diamond-shape, there seems to be an interference between the rules loaded in the intermediate files. Imagine a workflow defined from a file with general rules common.smk; two intermediate files corresponding to two different processes/studies first.smk and second.smk; and a final file collecting the main results of the two previous ones all.smk, whose rules are renamed to avoid clashes. If we run snakemake using all.smk as an input, there seems to be an interference between the rules imported from common.smk in first.smk and second.smk, even if the final file all.smk renames the rules.

If one uses the syntax use rule * from common in the intermediate files it looks like Snakemake is simply considering the rules in common.smk from the latest file that imports them, although they can be accessed in the file through the rules object. On the other hand, if we write use rule * from common as * the execution works fine. This looks more like a bug than a feature, or at least a design error.

Probably related to #1872, #2729, #2838

Minimal example

Define the general file containing a rule that creates a file, where part of the path depends on the provided configuration:

common.smk

import os

rule write:
    output: os.path.join('data', config['analysis'], 'input_value_{value}.txt')
    params: config_value=config['value']
    shell: 'echo {params.config_value} >> {output}'

Then make two separate files corresponding to two different studies, which simply make an alias to the file created with the rule in common.smk:

first.smk

import os

module common:
    snakefile: './common.smk'
    config: {"analysis": "first", "value": 1}

#use rule * from common as * # <--- works
use rule * from common # <--- fails if using the file all.smk

rule result:
    input: expand(rules.write.output, value=1)
    output: os.path.join('data', 'first', 'result.txt')
    shell: 'ln -srf {input} {output}'

second.smk

import os

module common:
    snakefile: './common.smk'
    config: {"analysis": "second", "value": 2}

#use rule * from common as * # <--- works
use rule * from common # <--- fails if using the file all.smk

rule result:
    input: expand(rules.write.output, value=2)
    output: os.path.join('data', 'second', 'result.txt')
    shell: 'ln -srf {input} {output}'

Finally, declare the file that collects all the results:

all.smk

module first:
    snakefile: './first.smk'
    config: config

module second:
    snakefile: './second.smk'
    config: config

use rule * from first as first_*
use rule * from second as second_*

assert(rules.first_write is not rules.second_write) # they always exist and they are always different, as expected

# if using "from common import *" in "first.smk" and "second.smk", then the way
# to obtain the files for the first result can not be resolved correctly
rule all:
    input: rules.first_result.output, rules.second_result.output

In this case we are propagating as well some configuration values to verify that it remains different when we run rules from all.smk. The expectation is that the commands

snakemake -s first.smk result -j1
snakemake -s second.smk result -j1

provide the same output as

snakemake -s all.smk all -j1

but this is only true if we load the rules as use rule * from common as * instead of use rule * from common inside first.smk and second.smk. Otherwise you get the following error

MissingInputException in rule first_result in file [MASKED]/first.smk, line 10:
Missing input files for rule first_result:
    output: data/first/result.txt
    affected files:
        data/first/input_value_1.txt

which suggests that somehow the rules that were imported from common.smk inside first.smk are not being considered.