nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.76k stars 629 forks source link

Offending keys error when running nf-core sarek on ~500 samples #2644

Closed acjmmartin closed 2 years ago

acjmmartin commented 2 years ago

Bug report

Expected behavior and actual behavior

In the mapping step of nf-core sarek, fastqs should be aligned to a reference genome. This works perfectly with a subset of my sample sheet (25 samples) but when running the complete sample sheet (497 samples) I get a "offending keys" error message. There are no duplicates in the sample sheet and none of the input files are missing.

Steps to reproduce the problem

I used to following commands: nextflow run /camp/project/tracerX/working/PIPELINES/nf-core/sarek/2.7/workflow -profile crick --step mapping --genome GRCh37 --input /camp/project/tracerX/working/CRENAL/WORKING/Tx200/clean_scripts/sex_sheet_Tx200_v6_SampleSheet_sarek.tsv --target_bed /camp/project/tracerX/working/PIPELINES/nf-core/sarek/2.7/custom_references/bed_files/No_chr_Rabbit_Hole_v6.bed --outdir /camp/project/tracerX/working/CRENAL/OUTPUT/Tx200_clean -c /camp/project/tracerX/working/PIPELINES/nf-core/sarek/2.7/custom_configs/custom_crick_sarek_hp.config

Program output

Error executing process > 'MapReads (K648-L003)'

Caused by:
  Oops.. something wrong happened while creating task 'MapReads' unique id -- Offending keys: [
 - type=java.util.UUID value=cf7e96ff-2ac4-4cd8-9ccc-e2c35c5a04e3, 
 - type=java.lang.String value=MapReads, 
 - type=java.lang.String value=CN = params.sequencing_center ? "CN:${params.sequencing_center}\\t" : ""
readGroup = "@RG\\tID:${idRun}\\t${CN}PU:${idRun}\\tSM:${idSample}\\tLB:${idSample}\\tPL:illumina"
status = statusMap[idPatient, idSample]
extra = status == 1 ? "-B 3" : ""
convertToFastq = hasExtension(inputFile1, "bam") ? "gatk --java-options -Xmx${task.memory.toGiga()}g SamToFastq --INPUT=${inputFile1} --FASTQ=/dev/stdout --INTERLEAVE=true --NON_PF=true | \\" : ""
input = hasExtension(inputFile1, "bam") ? "-p /dev/stdin - 2> >(tee ${inputFile1}.bwa.stderr.log >&2)" : "${inputFile1} ${inputFile2}"
aligner = params.aligner == "bwa-mem2" ? "bwa-mem2" : "bwa"
"""
    ${convertToFastq}
    ${aligner} mem -K 100000000 -R \"${readGroup}\" ${extra} -t ${task.cpus} -M ${fasta} \
    ${input} | \
    samtools sort --threads ${task.cpus} -m 2G - > ${idSample}_${idRun}.bam
    """
, 
 - type=java.lang.String value=/camp/project/tracerX/working/PIPELINES/nf-core/sarek/2.7/workflow/../singularity-images/nfcore-sarek-2.7.img, 
 - type=java.lang.String value=idPatient, 
 - type=java.lang.String value=K648, 
 - type=java.lang.String value=idSample, 
 - type=java.lang.String value=G_K648_BC1d1, 
 - type=java.lang.String value=idRun, 
 - type=java.lang.String value=L003, 
 - type=java.lang.String value=inputFile1, 
 - type=nextflow.util.ArrayBag value=[FileHolder(sourceObj:/camp/project/tracerX/working/CRENAL/SOURCES/Tx200_panel_v6/AUY227A265_S57_L003_R1_001.fastq.gz, storePath:/camp/stp/sequencing/inputs/instruments/fastq/190710_NB501505_0101_AH5YHKBGXB/fastq/DN18127/AUY227A265_S57_L003_R1_001.fastq.gz, stageName:AUY227A265_S57_L003_R1_001.fastq.gz)], 
 - type=java.lang.String value=inputFile2, 
 - type=nextflow.util.ArrayBag value=[FileHolder(sourceObj:/camp/project/tracerX/working/CRENAL/SOURCES/Tx200_panel_v6/AUY227A265_S57_L003_R2_001.fastq.gz, storePath:/camp/stp/sequencing/inputs/instruments/fastq/190710_NB501505_0101_AH5YHKBGXB/fastq/DN18127/AUY227A265_S57_L003_R2_001.fastq.gz, stageName:AUY227A265_S57_L003_R2_001.fastq.gz)], 
 - type=java.lang.String value=bwaIndex, 
 - type=nextflow.util.ArrayBag value=[FileHolder(sourceObj:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.ann, storePath:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.ann, stageName:human_g1k_v37_decoy.fasta.ann), FileHolder(sourceObj:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.pac, storePath:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.pac, stageName:human_g1k_v37_decoy.fasta.pac), FileHolder(sourceObj:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.sa, storePath:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.sa, stageName:human_g1k_v37_decoy.fasta.sa), FileHolder(sourceObj:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.bwt, storePath:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.bwt, stageName:human_g1k_v37_decoy.fasta.bwt), FileHolder(sourceObj:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.amb, storePath:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.amb, stageName:human_g1k_v37_decoy.fasta.amb)], 
 - type=java.lang.String value=fasta, 
 - type=nextflow.util.ArrayBag value=[FileHolder(sourceObj:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta, storePath:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta, stageName:human_g1k_v37_decoy.fasta)], 
 - type=java.lang.String value=fastaFai, 
 - type=nextflow.util.ArrayBag value=[FileHolder(sourceObj:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta.fai, storePath:/camp/svc/reference/Genomics/aws-igenomes/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta.fai, stageName:human_g1k_v37_decoy.fasta.fai)], 
 - type=java.lang.String value=$, 
 - type=java.lang.Boolean value=true, 
 - type=java.util.HashMap$EntrySet value=[params.aligner=bwa-mem, statusMap={[K656, G_K656_BC1d1]=0, [K656, G_K656_R1d1]=1, etc }, params.sequencing_center=null]]`

Environment

Additional context

nextflow.log

abhi18av commented 2 years ago

Hi @acjmmartin ,

Perhaps this should be reported to nf-core/sarek repo? Or the #sarek slack channel?

acjmmartin commented 2 years ago

Hi @abhi18av , I raised the issue with them and they advised me to raise it here because the only exception they could find in the nextflow log was a ["ConcurrentModificationException"

abhi18av commented 2 years ago

Ah, I see.

Possible for you to try again after doing a nextflow self-update (to update to v21.10.6 release) and then appending -resume to your command?

If this fails again, please share the .nextflow.log file

b97pla commented 2 years ago

I'm having the same problem using v21.10.6, were you able to get to the bottom of this issue @acjmmartin ?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

acjmmartin commented 2 years ago

I'm having the same problem using v21.10.6, were you able to get to the bottom of this issue @acjmmartin ?

Apologies, I moved on to a different project so never got to the bottom of this issue. Have you? @b97pla

acjmmartin commented 2 years ago

The issue disappeared when using Nextflow/22.04.0