snakemake / snakemake

This is the development home of the workflow management system Snakemake. For general information, see
https://snakemake.github.io
MIT License
2.26k stars 551 forks source link

Attribute Error: 'str' object has no attribute 'get' #2361

Closed ltalignani closed 1 year ago

ltalignani commented 1 year ago

Snakemake version: 7.25.0 (same problem on 7.7.0)

Describe the bug I am developping a snakemake pipeline on a SLURM cluster. When I run my pipeline, it crashes with the following logs. I tried to replace all the config shortcuts (like config['fastq-screen']['config']), by their values, and it works. But it's impractical and ruins the whole point of the pipeline.

Logs

Using shell: /usr/bin/bash
Provided cluster nodes: 50
Provided resources: cpus=30, mem_mb=400000
Job stats:
job                                   count    min threads    max threads
----------------------------------  -------  -------------  -------------
GenomicsDBImport                          5              1              1
GenotypeGVCFs_merge                       5              1              1
HaplotypeCaller                           5              1              1
SetNmMdAndUqTags                          1              1              1
all                                       1              1              1
bcftools_concat                           5              1              1
bwa_mapping                               1              1              1
create_sequence_dict                      1              1              1
create_sequence_faidx                     1              1              1
fastqscreen_contamination_checking        1              1              1
fixmateinformation                        1              1              1
gatk_filter                               5              1              1
mark_duplicates                           1              1              1
report_vcf                                5              1              1
trimmomatic                               1              1              1
vcf_stats                                 5              1              1
total                                    44              1              1

mkdir -p Cluster_logs/
Select jobs to execute...

[Tue Jul 18 10:34:42 2023]
Job 1: Fastq-Screen reads contamination checking
Reason: Missing output files: results/00_Quality_Control/fastq-screen

module load fastq-screen/0.13.0
module load bwa/0.7.17
            fastq_screen -q --threads 1 --conf config/fastq-screen.conf --aligner bwa --subset 1000 --outdir results/00_Quality_Control/fastq-screen raw/*.fastq.gz &> results/11_Reports/quality/fastq-screen.log

Traceback (most recent call last):
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/__init__.py", line 757, in snakemake
    success = workflow.execute(
              ^^^^^^^^^^^^^^^^^
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/workflow.py", line 1095, in execute
    raise e
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/workflow.py", line 1091, in execute
    success = self.scheduler.schedule()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/scheduler.py", line 606, in schedule
    self.run(runjobs)
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/scheduler.py", line 655, in run
    executor.run_jobs(
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/__init__.py", line 155, in run_jobs
    self.run(
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/__init__.py", line 1153, in run
    jobscript = self.get_jobscript(job)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/__init__.py", line 872, in get_jobscript
    f = self.get_jobname(job)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/__init__.py", line 869, in get_jobname
    return job.format_wildcards(self.jobname, cluster=self.cluster_wildcards(job))
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/__init__.py", line 931, in cluster_wildcards
    return Wildcards(fromdict=self.cluster_params(job))
                              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/__init__.py", line 906, in cluster_params
    cluster = self.cluster_config.get("__default__", dict()).copy()
              ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get'

Minimal example config.yaml:

# FASTQSCREEN --------------------------------------------------------------------------------------
fastq-screen:
  config: "config/fastq-screen.conf" # Path to the fastq-screen configuration file
  subset: 1000 # Don't use the whole sequence file, but sub-dataset of specified number of read (default: '1000') [INT] (0='all')
  aligner: "bwa" # Aligner for fastq-screen (default and should be 'bwa')

and now, the snakefile:

############################################################################### 
CONFIGURATION FILES
configfile: "config/config.yaml"
cluster_config: "slurm/config.yaml"

import os, sys
from snakemake.utils import min_version
min_version("6.12.0")

###############################################################################
WILDCARDS
SAMPLE, = glob_wildcards("raw/{sample}_R1.fastq.gz")
READS = ['1', '2']

###############################################################################
RESOURCES
TMPDIR = config["resources"]["tmpdir"] # Temporary directory

###############################################################################
PARAMETERS
CONFIG = config["fastq-screen"]["config"]           # Fastq-screen --conf
MAPPER = config["fastq-screen"]["aligner"]          # Fastq-screen --aligner
SUBSET = config["fastq-screen"]["subset"]           # Fastq-screen --subset

################################################################################
rule fastqscreen_contamination_checking:
    message: "Fastq-Screen reads contamination checking"
    resources: cpus=1, mem_mb=4000, tim_min=60
    params:
        partition = 'fast',
        config = CONFIG,
        mapper = MAPPER,
        subset = SUBSET
    input:
        fastq = "raw/"
    output:
        fastqscreen = directory("results/00_Quality_Control/fastq-screen/"),
    log:
        "results/11_Reports/quality/fastq-screen.log"
    shell:
        config["MODULES"]["FASTQSCREEN"]+"\n"+config["MODULES"]["BWA"]+"""
            fastq_screen -q --threads {resources.cpus} --conf {params.config} --aligner {params.mapper} --subset {params.subset} --outdir {output.fastqscreen} {input.fastq}/*.fastq.gz &> {log}
        """

Additional context Python 3.11, Graphviz 2.40,

I'm not sure if it's a bug, but I've never encountered this problem before. Thanks in advance for your help.

cademirch commented 1 year ago

Seems like the interpreter is complaining about the cluster config file. Could you share that? Also you can use triple backticks to make your code more readable.

Lastly, it looks like your using your config to specify tool paths in your shell command, its generally recommended to use conda or wrappers to achieve portability and versioning.

ltalignani commented 1 year ago

Thanks for your comment and sorry about the look of my code. I copied and pasted my config file directly without realizing that I use hashtags in it to separate sections. I update it soon.

I am not using a cluster config file, but a slurm profile named config.yaml, stored in a slurm/ directory. I made this profile with the help of simple-slurm : https://github.com/jdblischak/smk-simple-slurm

jobs: 50
cluster: "sbatch --parsable -p {params.partition} -t {resources.time_min} --mem={resources.mem_mb} -c {resources.cpus} -o Cluster_logs/{rule}_{wildcards}-%j.out -e Cluster_logs/{rule}_{wildcards}-%j.err  
default-resources: [cpus=1, mem_mb=4000, time_min=6000]  
resources: [cpus=30, mem_mb=400000]  
restart-times: 3
max-jobs-per-second: 10
max-status-checks-per-second: 1  
local-cores: 8  
latency-wait: 600  
keep-going: true  
rerun-incomplete: true  
printshellcmds: true  
scheduler: greedy  
cluster-status: status-sacct.sh 

I agree with you regarding the use of conda and wrappers, however, the tools are already installed on the cluster and thanks to them I have a significant performance gain regarding the processing time of my data if I don't use conda.

I use a bash script to launch the pipeline, containing just the following commands:

##### Colors ######
red="\033[1;31m"   # red
green="\033[1;32m" # green
ylo="\033[1;33m"   # yellow
blue="\033[1;34m"  # blue
nc="\033[0m"       # no color

###### Call snakemake pipeline ######

echo -e "${blue}Unlocking working directory:${nc}"
echo ""

snakemake --profile slurm/ --slurm --default-resources slurm_account=aedes_amplicon slurm_partition=long --directory ${workdir}/ --snakefile workflow/snakefile.smk --unlock
echo ""
echo -e "${blue}Let's run!${nc}"
echo ""

snakemake --profile slurm/ --slurm --default-resources slurm_account=aedes_amplicon slurm_partition=long -directory ${workdir}/ --snakefile workflow/snakefile.smk --cores 30 --configfile config/config.yaml

###### Create usefull graphs, summary and logs ######

mkdir ${workdir}/results/10_Graphs/ 2> /dev/null

graph_list="dag rulegraph filegraph"
extention_list="pdf png"

for graph in ${graph_list} ; do
    for extention in ${extention_list} ; do
    snakemake --profile slurm/ \
               -slurm --default-resources slurm_account=aedes_amplicon slurm_partition=long  \
               --snakefile workflow/snakefile.smk --${graph} | dot -T${extention} > ${workdir}/results/10_Graphs/${graph}.${extention} ;
    done ;
done

snakemake -slurm --default-resources slurm_account=aedes_amplicon slurm_partition=long \
         --directory ${workdir} --profile slurm/ --snakefile workflow/snakefile.smk \
         --summary > ${workdir}/results/11_Reports/files_summary.txt

###### End managment ######

I also tried to run snakemake with the command line, and I had the same error.

EDIT : I updated the code

cademirch commented 1 year ago

Okay, thanks for updating that. I'm not familiar with simple-slurm, but the issue could be this line in your Snakefile:

cluster_config: "slurm/config.yaml"

I'd also recommend looking at the latest way to execute on a SLURM cluster, using the --slurm flag.

ltalignani commented 1 year ago

I've updated the command launching the pipeline and I don't have this problem any more, since I no longer use the slurm profile.

the --slurm flag means you don't need a profile configuration, which is a great improvement.