nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
388 stars 401 forks source link

FASTP tooling error " ERROR: sequence and quality have different length" #1618

Closed jbague closed 2 weeks ago

jbague commented 1 month ago

Description of the bug

Hi, I am running 40 samples on Sarek pipeline. A little number of samples are producing an error during the FASTP stage but other samples run completely the pipeline. All samples are sequenced using the same methodology.

Command used and terminal output

SCRIPT:

#!/bin/bash
#SBATCH --qos=standard
#SBATCH --job-name=sarek_array_3_4_2    # Job name
#SBATCH --tasks=1
#SBATCH --cpus-per-task=40
#SBATCH --tasks-per-node=1
#SBATCH --nodes=1
#SBATCH --output=/slgpfs/projects/cli20/cli20901/soft/prova_nf_sarek_3.4.2/output/sarek_somatic_%j.out
#SBATCH --chdir=/slgpfs/projects/cli20/cli20901/soft/prova_nf_sarek_3.4.2
#SBATCH --error=/slgpfs/projects/cli20/cli20901/soft/prova_nf_sarek_3.4.2/output/sarek_somatic_%j.err
#SBATCH --time=2-12:00:00

module load java singularity/3.8.7 nextflow
export NXF_OFFLINE='true'
export NXF_SINGULARITY_CACHEDIR=/slgpfs/projects/cli20/cli20901/soft/nf-core-sarek_3.4.2/singularity-images

nextflow run /slgpfs/projects/cli20/cli20901/soft/prova_nf_sarek_3.4.2/3_4_2/main.nf -profile singularity --input batch_2_modified_firstpart.csv --outdir ./results_batch_2 --max_cpus 40 --genome GATK.GRCh38 --ngscheckmate_bed 'false' --igenomes_base /slgpfs/projects/cli20/cli20901/soft/prova_nf_sarek_3.4.2/references --wes --tools mutect2 --joint_mutect2

OUTPUT:

ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:FASTP (qG17029053-1)'

Caused by:
  Process `NFCORE_SAREK:SAREK:FASTP (qG17029053-1)` terminated with an error exit status (255)

Command executed:

  [ ! -f  qG17029053-1_1.fastq.gz ] && ln -sf qG17029053.R1.fq.gz qG17029053-1_1.fastq.gz
  [ ! -f  qG17029053-1_2.fastq.gz ] && ln -sf qG17029053.R2.fq.gz qG17029053-1_2.fastq.gz
  fastp \
      --in1 qG17029053-1_1.fastq.gz \
      --in2 qG17029053-1_2.fastq.gz \
      --out1 qG17029053-1_1.fastp.fastq.gz \
      --out2 qG17029053-1_2.fastp.fastq.gz \
      --json qG17029053-1.fastp.json \
      --html qG17029053-1.fastp.html \
       \
       \
       \
      --thread 12 \
      --detect_adapter_for_pe \
      --disable_adapter_trimming      --split_by_lines 200000000 \
      2> >(tee qG17029053-1.fastp.log >&2)

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:SAREK:FASTP":
      fastp: $(fastp --version 2>&1 | sed -e "s/fastp //g")
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Command error:
  ERROR: sequence and quality have different length:
  @NB501979:93:HKNGNBGX3:1:13301:5679:14867/2
  ATCCACACGGCCAACCCCATGGAACACGCCAACCACATGGCTGCCCAGCCACAGTTCGTGCACCCGGAACACCGCTCCTTTGTTGACCTGTCAGGCCACAACCTGGCCAACCCCCACCCGTTCGCAGGTAGGACATGGGGAGGG
  +
  <=>>ABAB=@CBBBCBBBBBE@BBCBC>DBBBCBBCBBE@DCEDBBBDDBBCBD@BD>@EDBCB>>@B?CBCB>DCDBC@BE@BE@CBCA@BBD@DBBCB?CBBE@DBBBCBBB:>CBB>>BD>DBD@@@D@BCBCE@@@BD@@B?+B>@0E@/BCE/BC+B>ABBE@0/B>ATGGAGCATCTCCGCTTGGTCTCCCTCCCCATGTCCT
  ERROR: sequence and quality have different length

Work dir:
  /slgpfs/projects/cli20/cli20901/soft/prova_nf_sarek_3.4.2/work/11/07ae6641c0b1e04dea5dde3d73bd2c

Relevant files

No response

System information

Summary software and hardware

Nextflow 23.10.0 Singularity 3.8.7 java 12.0.2 Hardware: HPC Executor: slurm nf-core/sarek 3.4.2

FriederikeHanssen commented 3 weeks ago

This looks like it is an error that is either coming from fastp or from your samples. I would advise to check the sample with a tool like seqkit , fq or similar