Attribute Error: 'str' object has no attribute 'get' #2361

ltalignani commented 1 year ago

Snakemake version: 7.25.0 (same problem on 7.7.0)

Describe the bug I am developping a snakemake pipeline on a SLURM cluster. When I run my pipeline, it crashes with the following logs. I tried to replace all the config shortcuts (like config['fastq-screen']['config']), by their values, and it works. But it's impractical and ruins the whole point of the pipeline.


Using shell: /usr/bin/bash
Provided cluster nodes: 50
Provided resources: cpus=30, mem_mb=400000
Job stats:
job                                   count    min threads    max threads
----------------------------------  -------  -------------  -------------
GenomicsDBImport                          5              1              1
GenotypeGVCFs_merge                       5              1              1
HaplotypeCaller                           5              1              1
SetNmMdAndUqTags                          1              1              1
all                                       1              1              1
bcftools_concat                           5              1              1
bwa_mapping                               1              1              1
create_sequence_dict                      1              1              1
create_sequence_faidx                     1              1              1
fastqscreen_contamination_checking        1              1              1
fixmateinformation                        1              1              1
gatk_filter                               5              1              1
mark_duplicates                           1              1              1
report_vcf                                5              1              1
trimmomatic                               1              1              1
vcf_stats                                 5              1              1
total                                    44              1              1

mkdir -p Cluster_logs/
Select jobs to execute...

[Tue Jul 18 10:34:42 2023]
Job 1: Fastq-Screen reads contamination checking
Reason: Missing output files: results/00_Quality_Control/fastq-screen

module load fastq-screen/0.13.0
module load bwa/0.7.17
            fastq_screen -q --threads 1 --conf config/fastq-screen.conf --aligner bwa --subset 1000 --outdir results/00_Quality_Control/fastq-screen raw/*.fastq.gz &> results/11_Reports/quality/fastq-screen.log

Traceback (most recent call last):
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/", line 757, in snakemake
    success = workflow.execute(
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/", line 1095, in execute
    raise e
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/", line 1091, in execute
    success = self.scheduler.schedule()
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/", line 606, in schedule
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/", line 655, in run
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/", line 155, in run_jobs
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/", line 1153, in run
    jobscript = self.get_jobscript(job)
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/", line 872, in get_jobscript
    f = self.get_jobname(job)
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/", line 869, in get_jobname
    return job.format_wildcards(self.jobname, cluster=self.cluster_wildcards(job))
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/", line 931, in cluster_wildcards
    return Wildcards(fromdict=self.cluster_params(job))
  File "/shared/ifbstor1/software/miniconda/envs/snakemake-7.25.0/lib/python3.11/site-packages/snakemake/executors/", line 906, in cluster_params
    cluster = self.cluster_config.get("__default__", dict()).copy()
AttributeError: 'str' object has no attribute 'get'

Minimal example config.yaml:

# FASTQSCREEN --------------------------------------------------------------------------------------
  config: "config/fastq-screen.conf" # Path to the fastq-screen configuration file
  subset: 1000 # Don't use the whole sequence file, but sub-dataset of specified number of read (default: '1000') [INT] (0='all')
  aligner: "bwa" # Aligner for fastq-screen (default and should be 'bwa')

and now, the snakefile:

configfile: "config/config.yaml"
cluster_config: "slurm/config.yaml"

import os, sys
from snakemake.utils import min_version

SAMPLE, = glob_wildcards("raw/{sample}_R1.fastq.gz")
READS = ['1', '2']

TMPDIR = config["resources"]["tmpdir"] # Temporary directory

CONFIG = config["fastq-screen"]["config"]           # Fastq-screen --conf
MAPPER = config["fastq-screen"]["aligner"]          # Fastq-screen --aligner
SUBSET = config["fastq-screen"]["subset"]           # Fastq-screen --subset

rule fastqscreen_contamination_checking:
    message: "Fastq-Screen reads contamination checking"
    resources: cpus=1, mem_mb=4000, tim_min=60
        partition = 'fast',
        config = CONFIG,
        mapper = MAPPER,
        subset = SUBSET
        fastq = "raw/"
        fastqscreen = directory("results/00_Quality_Control/fastq-screen/"),
            fastq_screen -q --threads {resources.cpus} --conf {params.config} --aligner {params.mapper} --subset {params.subset} --outdir {output.fastqscreen} {input.fastq}/*.fastq.gz &> {log}

Additional context Python 3.11, Graphviz 2.40,

I'm not sure if it's a bug, but I've never encountered this problem before. Thanks in advance for your help.

cademirch commented 1 year ago

Seems like the interpreter is complaining about the cluster config file. Could you share that? Also you can use triple backticks to make your code more readable.

Lastly, it looks like your using your config to specify tool paths in your shell command, its generally recommended to use conda or wrappers to achieve portability and versioning.

ltalignani commented 1 year ago

Thanks for your comment and sorry about the look of my code. I copied and pasted my config file directly without realizing that I use hashtags in it to separate sections. I update it soon.

I am not using a cluster config file, but a slurm profile named config.yaml, stored in a slurm/ directory. I made this profile with the help of simple-slurm :

jobs: 50
cluster: "sbatch --parsable -p {params.partition} -t {resources.time_min} --mem={resources.mem_mb} -c {resources.cpus} -o Cluster_logs/{rule}_{wildcards}-%j.out -e Cluster_logs/{rule}_{wildcards}-%j.err  
default-resources: [cpus=1, mem_mb=4000, time_min=6000]  
resources: [cpus=30, mem_mb=400000]  
restart-times: 3
max-jobs-per-second: 10
max-status-checks-per-second: 1  
local-cores: 8  
latency-wait: 600  
keep-going: true  
rerun-incomplete: true  
printshellcmds: true  
scheduler: greedy  

I agree with you regarding the use of conda and wrappers, however, the tools are already installed on the cluster and thanks to them I have a significant performance gain regarding the processing time of my data if I don't use conda.

I use a bash script to launch the pipeline, containing just the following commands:

I also tried to run snakemake with the command line, and I had the same error.

EDIT : I updated the code

cademirch commented 1 year ago

Okay, thanks for updating that. I'm not familiar with simple-slurm, but the issue could be this line in your Snakefile:

cluster_config: "slurm/config.yaml"

I'd also recommend looking at the latest way to execute on a SLURM cluster, using the --slurm flag.

ltalignani commented 1 year ago

I've updated the command launching the pipeline and I don't have this problem any more, since I no longer use the slurm profile.

the --slurm flag means you don't need a profile configuration, which is a great improvement.