nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
https://nf-co.re/rnaseq
MIT License
908 stars 705 forks source link

Issue with cutadapt #783

Closed ChiaraF32 closed 2 years ago

ChiaraF32 commented 2 years ago

Description of the bug

Background I am using nfcore/rnaseq pipeline to process batches of paired-end reverse stranded rnaseq data from a number of cell/tissue types, mainly muscle and fibroblast. I have had success with the pipeline in general, with a number of batches reaching completion without issue.

However, I am having recurring issues with trimgalore. The issues appear to be sample-specific, as my approach so far has been to remove the problematic data from the input samplesheet and re-run the pipeline, which then runs fine.

This the error message I recieve:

Error executing process > 'NFCORE_RNASEQ:RNASEQ:FASTQC_UMITOOLS_TRIMGALORE:TRIMGALORE (D18-1827)'

Caused by:
  Process `NFCORE_RNASEQ:RNASEQ:FASTQC_UMITOOLS_TRIMGALORE:TRIMGALORE (D18-1827)` terminated with an error exit status (1)

Command executed:

  [ ! -f  D18-1827_1.fastq.gz ] && ln -s D18-1827_1.merged.fastq.gz D18-1827_1.fastq.gz
  [ ! -f  D18-1827_2.fastq.gz ] && ln -s D18-1827_2.merged.fastq.gz D18-1827_2.fastq.gz
  trim_galore \
      --fastqc \
      --cores 4 \
      --paired \
      --gzip \
       \
       \
       \
       \
      D18-1827_1.fastq.gz \
      D18-1827_2.fastq.gz
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASEQ:RNASEQ:FASTQC_UMITOOLS_TRIMGALORE:TRIMGALORE":
      trimgalore: $(echo $(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*$//')
      cutadapt: $(cutadapt --version)
  END_VERSIONS

Command exit status:
  1

Command output:
  pigz 2.6

Command error:
  ERROR: Traceback (most recent call last):
    File "/usr/local/lib/python3.9/site-packages/cutadapt/pipeline.py", line 559, in run
      for chunk_index, chunk in enumerate(dnaio.read_chunks(f, self.buffer_size)):
    File "/usr/local/lib/python3.9/site-packages/dnaio/chunks.py", line 81, in read_chunks
      bufend = f.readinto(memoryview(buf)[start:]) + start  # type: ignore
    File "/usr/local/lib/python3.9/gzip.py", line 300, in read
      return self._buffer.read(size)
    File "/usr/local/lib/python3.9/_compression.py", line 68, in readinto
      data = self.read(len(byte_view))
    File "/usr/local/lib/python3.9/gzip.py", line 495, in read
      uncompress = self._decompressor.decompress(buf, size)
    File "src/isal/isal_zlib.pyx", line 523, in isal.isal_zlib.Decompress.decompress
    File "src/isal/igzip_lib.pyx", line 437, in isal.igzip_lib.check_isal_inflate_rc
  isal.igzip_lib.IsalError: Invalid lookback distance found

  ERROR: Traceback (most recent call last):
    File "/usr/local/lib/python3.9/site-packages/cutadapt/pipeline.py", line 559, in run
      for chunk_index, chunk in enumerate(dnaio.read_chunks(f, self.buffer_size)):
    File "/usr/local/lib/python3.9/site-packages/dnaio/chunks.py", line 81, in read_chunks
      bufend = f.readinto(memoryview(buf)[start:]) + start  # type: ignore
    File "/usr/local/lib/python3.9/gzip.py", line 300, in read
      return self._buffer.read(size)
    File "/usr/local/lib/python3.9/_compression.py", line 68, in readinto
      data = self.read(len(byte_view))
    File "/usr/local/lib/python3.9/gzip.py", line 495, in read
      uncompress = self._decompressor.decompress(buf, size)
    File "src/isal/isal_zlib.pyx", line 523, in isal.isal_zlib.Decompress.decompress
    File "src/isal/igzip_lib.pyx", line 437, in isal.igzip_lib.check_isal_inflate_rc
  isal.igzip_lib.IsalError: Invalid lookback distance found

  ERROR: Traceback (most recent call last):
    File "/usr/local/lib/python3.9/site-packages/cutadapt/pipeline.py", line 626, in run
      raise e
  isal.igzip_lib.IsalError: Invalid lookback distance found

  Traceback (most recent call last):
    File "/usr/local/bin/cutadapt", line 10, in <module>
      sys.exit(main_cli())
    File "/usr/local/lib/python3.9/site-packages/cutadapt/__main__.py", line 848, in main_cli
      main(sys.argv[1:])
    File "/usr/local/lib/python3.9/site-packages/cutadapt/__main__.py", line 913, in main
      stats = r.run()
    File "/usr/local/lib/python3.9/site-packages/cutadapt/pipeline.py", line 825, in run
      raise e
  isal.igzip_lib.IsalError: Invalid lookback distance found

  Cutadapt terminated with exit signal: '256'.
  Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...

Work dir:
  /data/nfcore/work/48/0ee34eda9744edb371ceecbfe7fb3a

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

I have the run log file attached for more information, as well as the work files from the trimgalore step.

Command used and terminal output

This is the run script I executed:

#!/bin/bash

batchID=run12 #specify batchID that matches that used when generating the samplesheet.csv file

/data/nfcore/nextflow run nf-core/rnaseq \
    -profile docker \
    --aligner star_salmon \
    --input samplesheet_$batchID.csv \
    --outdir ./results_$batchID \
    --multiqc_title ${batchID}_multiqc \
    --igenomes_base /data/references/iGenomes/ \
    --genome GRCh38 \
    --star_index '/data/references/nfcore/' \
    --max_memory 60GB #specified in accordance with capacity of Nimbus instance allocation, to avoid pipeline crashing

Relevant files

2022-03-14_run12.zip

System information

drpatelh commented 2 years ago

Hi @ChiaraF32 ! Apologies in the delay in responding. Did you figure out why the pipeline was failing?

If you run the pipeline on the bad samples individually does the pipeline fail? Trying to figure out whether this is an intermittent issue. @FelixKrueger any ideas?

PS: If you haven't already it may be worth joining the #rnaseq channel on the nf-core Slack workspace for more real-time help. I try to keep up but it takes longer than I would like to respond to issues here.

FelixKrueger commented 2 years ago

Hmm, I don't think I have seen this before. This error isn't anything specific to Trim Galore, but is thrown by the python code (isal.igzip_lib.IsalError: Invalid lookback distance found) which causes Cutadapt and eventually also Trim Galore to fail....

drpatelh commented 2 years ago

Yep, ok. Thanks! Will close for now. I am prepping v3.7 so be good to test with that and if the issue persists please feel free to re-open.

ChiaraF32 commented 2 years ago

Hi @drpatelh,

No worries for the delay - thanks for getting back to me.

Just tried running the sample by itself, but it came up with the same error :/

Thanks for the suggestion to join the channel in the Slack space.