nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
351 stars 386 forks source link

same sample name over multiple patient doest not fail input schema validation #1503

Open maxulysse opened 2 months ago

maxulysse commented 2 months ago

Description of the bug

This is the output that is seen on the terminal once the pipeline has failed after GATK4_MARKDUPLICATES, my guess is that one of the later join operator is causing the subsequent failure:

Detected join operation duplicate emission on left channel -- offending element: key=[patient:test2, sample:test, sex:XX, status:0, n_fastq:1, data_type:bam, id:test]; value=/home/max/workspace/sarek/work/fe/2e8890cae572ee686c7475edd6e895/test.md.cram

We should really fail early for that.

Issue reported by Ist4lri

Command used and terminal output

No response

Relevant files

No response

System information

No response

Ist4lri commented 2 months ago
  1. Command used and terminal output :
nextflow run nf-core/sarek -r dev -profile singularity -c custom.config -params-file nf-params.json
Error : Detected join operation duplicate emission on left channel -- offending element: key=[patient:test2, sample:test, sex:XX, status:0, n_fastq:1, data_type:bam, id:test]; value=/home/max/workspace/sarek/work/fe/2e8890cae572ee686c7475edd6e895/test.md.cram
  1. Relevant files :

With this sample :

patient,sample,lane,fastq_1,fastq_2,status
BR664F,liver,1,/path/to/the/file/BR664F_R1.fastq.gz,/path/to/the/file/BR664F_R2.fastq.gz,1
BR665F,liver,1,/path/to/the/file/BR665F_R1.fastq.gz,/path/to/the/file/BR665F_R2.fastq.gz,1
BR666F,liver,1,/path/to/the/file/BR666F_R1.fastq.gz,/path/to/the/file/BR666F_R2.fastq.gz,1
BR667F,liver,1,/path/to/the/file/BR667F_R1.fastq.gz,/path/to/the/file/BR667F_R2.fastq.gz,1
BR668F,liver,1,/path/to/the/file/BR668F_R1.fastq.gz,/path/to/the/file/BR668F_R2.fastq.gz,1
BR669F,liver,1,/path/to/the/file/BR669F_R1.fastq.gz,/path/to/the/file/BR669F_R2.fastq.gz,1
BR670F,liver,1,/path/to/the/file/BR670F_R1.fastq.gz,/path/to/the/file/BR670F_R2.fastq.gz,1
BR671F,liver,1,/path/to/the/file/BR671F_R1.fastq.gz,/path/to/the/file/BR671F_R2.fastq.gz,1
{
    "input": "sample.csv",
    "outdir": "results",
    "wes": "true",
    "fasta": "/path/to/this/file/GRCh38_latest_genomic.fna",
    "aligner": "bwa-mem2",
    "skip_tools": "baserecalibrator,markduplicates"
}
  1. System Information

HPC Curta on MCIA (Mésocentre calcul intensif aquitain) I downloaded sarek on local files in cluster, because there is no profile on this cluster (not the same than IFB.)