FileNotFoundError if subworkflow target is pseudo-rule

lumpiluk commented 4 years ago

Snakemake version 5.10.0

Describe the bug Calling a subworkflow sub with a target that is not a file, but rather a pseudo-target, the target rule first runs successfully, but then Snakemake throws a FileNotFoundError for the file <working directory>/all.

Logs

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   all
    1   sub_rule
    2

[Fri Feb  7 16:51:31 2020]
rule sub_rule:
    output: sub_rule.file
    jobid: 1

[Fri Feb  7 16:51:31 2020]
Finished job 1.
1 of 2 steps (50%) done

[Fri Feb  7 16:51:31 2020]
localrule all:
    input: sub_rule.file
    jobid: 0

[Fri Feb  7 16:51:31 2020]
Finished job 0.
2 of 2 steps (100%) done
Complete log: /home/lukas/snaketest/.snakemake/log/2020-02-07T165131.311128.snakemake.log
Executing main workflow.
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   all
    1
Traceback (most recent call last):
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/__init__.py", line 561, in snakemake
    success = workflow.execute(
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/workflow.py", line 850, in execute
    success = scheduler.schedule()
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/scheduler.py", line 360, in schedule
    run = self.job_selector(needrun)
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/scheduler.py", line 504, in job_selector
    c = list(map(self.job_reward, jobs))  # job rewards
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/scheduler.py", line 589, in job_reward
    input_size = job.inputsize
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/jobs.py", line 348, in inputsize
    self._inputsize = sum(f.size for f in self.input)
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/jobs.py", line 348, in <genexpr>
    self._inputsize = sum(f.size for f in self.input)
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/io.py", line 139, in wrapper
    return func(self, *args, **kwargs)
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/io.py", line 154, in wrapper
    return func(self, *args, **kwargs)
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/io.py", line 350, in size
    return self.size_local
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/io.py", line 355, in size_local
    self.check_broken_symlink()
  File "/home/lukas/.local/share/virtualenvs/B2Ok88LB/lib/python3.8/site-packages/snakemake/io.py", line 360, in check_broken_symlink
    if not self.exists_local and os.lstat(self.file):
FileNotFoundError: [Errno 2] No such file or directory: '/home/lukas/snaketest/all'

Minimal example Run snakemake all.

Snakefile:

subworkflow sub:
    snakefile:
        "sub.snake"
    configfile:
        "snakemake_config.yaml"

rule all:
    input:
        sub("all")

sub.snake:

rule sub_rule:
    output:
        "sub_rule.file"
    shell:
        "echo '42' > sub_rule.file"

rule all:
    input:
        rules.sub_rule.output

snakemake_config.yaml (because of #24 and because configfiles can't be empty):

some_unused_key:

Additional context

nick-youngblut commented 3 years ago

I seem to be getting the same error when not using a subworkflow. The rule in which the job dies utilizes a function to define the input files:

def which_input_cluster_genes_nuc(wildcards, type='nuc'):
    """
    Compressed input files? Nucleotide or protein?
    """
    if config['keep_intermediate'] == True:
        if type == 'nuc':
            if config['use_ancient'] == True:
                return ancient(annot_dir + 'prodigal/{sample}/annot.fna.gz')
            else:
                return annot_dir + 'prodigal/{sample}/annot.fna.gz'
        else:
            if config['use_ancient'] == True:
                return ancient(annot_dir + 'prodigal/{sample}/annot.faa.gz')
            else:
                return annot_dir + 'prodigal/{sample}/annot.faa.gz'
    else:
        if type == 'nuc':
            return config['tmp_dir'] + 'prodigal/{sample}/annot.fna'
        else:
            return config['tmp_dir'] + 'prodigal/{sample}/annot.faa'

The rule:

rule cluster_genes_nuc:
    """
    Clustering genes (at nuc level) and taking the centroid.
    This is done for each sample (genome).
    """
    input:
        fna = lambda wildcards: which_input_cluster_genes_nuc(wildcards, type='nuc'),
        faa = lambda wildcards: which_input_cluster_genes_nuc(wildcards, type='prot')
    output:
        reps = temp(config['tmp_dir'] + 'vsearch/{sample}_annot_reps.fna'),
        fna = annot_dir + 'nuc_filtered/{sample}_annot_reps.fna.gz',
        faa = annot_dir + 'prot_filtered/{sample}_annot_reps.faa.gz'
   [...]

The error:

Traceback (most recent call last):
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/__init__.py", line 687, in snakemake
    success = workflow.execute(
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/workflow.py", line 1005, in execute
    success = scheduler.schedule()
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 470, in schedule
    run = self.job_selector(needrun)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 741, in job_selector_ilp
    return self.job_selector_greedy(jobs)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 768, in job_selector_greedy
    c = list(map(self.job_reward, jobs))  # job rewards
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 851, in job_reward
    input_size = job.inputsize
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/jobs.py", line 378, in inputsize
    self._inputsize = sum(f.size for f in self.input)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/jobs.py", line 378, in <genexpr>
    self._inputsize = sum(f.size for f in self.input)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 242, in wrapper
    return func(self, *args, **kwargs)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 257, in wrapper
    return func(self, *args, **kwargs)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 556, in size
    return self.size_local
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 561, in size_local
    self.check_broken_symlink()
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 566, in check_broken_symlink
    if not self.exists_local and os.lstat(self.file):
FileNotFoundError: [Errno 2] No such file or directory: '/ebio/abt3_projects/databases_no-backup/GTDB/release95/Struo/benchmarking/db_create/UniRef50-90/n500/struo1/uniref90/annotated_genes/prodigal/GB_GCA_001917435_1_Acidobacteria_bacterium_13_2_20CM_2_57_6/annot.fna.gz'

I'm using snakemake 5.30.1

luoshizhi commented 3 years ago

I seem to be getting the same error when not using a subworkflow. The rule in which the job dies utilizes a function to define the input files:

def which_input_cluster_genes_nuc(wildcards, type='nuc'):
    """
    Compressed input files? Nucleotide or protein?
    """
    if config['keep_intermediate'] == True:
        if type == 'nuc':
            if config['use_ancient'] == True:
                return ancient(annot_dir + 'prodigal/{sample}/annot.fna.gz')
            else:
                return annot_dir + 'prodigal/{sample}/annot.fna.gz'
        else:
            if config['use_ancient'] == True:
                return ancient(annot_dir + 'prodigal/{sample}/annot.faa.gz')
            else:
                return annot_dir + 'prodigal/{sample}/annot.faa.gz'
    else:
        if type == 'nuc':
            return config['tmp_dir'] + 'prodigal/{sample}/annot.fna'
        else:
            return config['tmp_dir'] + 'prodigal/{sample}/annot.faa'

The rule:

rule cluster_genes_nuc:
    """
    Clustering genes (at nuc level) and taking the centroid.
    This is done for each sample (genome).
    """
    input:
        fna = lambda wildcards: which_input_cluster_genes_nuc(wildcards, type='nuc'),
        faa = lambda wildcards: which_input_cluster_genes_nuc(wildcards, type='prot')
    output:
        reps = temp(config['tmp_dir'] + 'vsearch/{sample}_annot_reps.fna'),
        fna = annot_dir + 'nuc_filtered/{sample}_annot_reps.fna.gz',
        faa = annot_dir + 'prot_filtered/{sample}_annot_reps.faa.gz'
   [...]

The error:

Traceback (most recent call last):
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/__init__.py", line 687, in snakemake
    success = workflow.execute(
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/workflow.py", line 1005, in execute
    success = scheduler.schedule()
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 470, in schedule
    run = self.job_selector(needrun)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 741, in job_selector_ilp
    return self.job_selector_greedy(jobs)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 768, in job_selector_greedy
    c = list(map(self.job_reward, jobs))  # job rewards
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 851, in job_reward
    input_size = job.inputsize
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/jobs.py", line 378, in inputsize
    self._inputsize = sum(f.size for f in self.input)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/jobs.py", line 378, in <genexpr>
    self._inputsize = sum(f.size for f in self.input)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 242, in wrapper
    return func(self, *args, **kwargs)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 257, in wrapper
    return func(self, *args, **kwargs)
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 556, in size
    return self.size_local
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 561, in size_local
    self.check_broken_symlink()
  File "/ebio/abt3_projects/software/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 566, in check_broken_symlink
    if not self.exists_local and os.lstat(self.file):
FileNotFoundError: [Errno 2] No such file or directory: '/ebio/abt3_projects/databases_no-backup/GTDB/release95/Struo/benchmarking/db_create/UniRef50-90/n500/struo1/uniref90/annotated_genes/prodigal/GB_GCA_001917435_1_Acidobacteria_bacterium_13_2_20CM_2_57_6/annot.fna.gz'

I'm using snakemake 5.30.1

I get the same error "FileNotFoundError" if I use “ancient” in the input, Why?

snakemake / snakemake

FileNotFoundError if subworkflow target is pseudo-rule #221

Additional context