s-andrews / nextflow_pipelines

The set of NGS processing pipelines used at Babraham
GNU General Public License v3.0
10 stars 14 forks source link

Incorrect detection of paired end data #16

Open s-andrews opened 3 years ago

s-andrews commented 3 years ago

With file names:

SRR1917137_GSM1635411_Ikaros_ChIP-Seq,_proB_cells_Mus_musculus_ChIP-Seq_1.fastq.gz
SRR1917139_GSM1635413_Brg1_ChIP-Seq,_proB_cells_Mus_musculus_ChIP-Seq_1.fastq.gz
SRR1917140_GSM1635414_Input,_proB_cells_Mus_musculus_ChIP-Seq_1.fastq.gz

The pipeline somehow decided that these are paired.

$ nf_chipseq --genome GRCm38 -bg *fastq.gz
[andrewss@headstone Aligned_GRCm38]$ N E X T F L O W  ~  version 20.07.1
Launching `/bi/apps/nextflow/nextflow_pipelines/nf_chipseq` [loving_williams] - revision: b99767329f
[94/7c96e0] Submitted process > TRIM_GALORE (null)
[d4/753225] Submitted process > FASTQ_SCREEN (null)
[48/0f610c] Submitted process > FASTQC (null)
[94/7c96e0] NOTE: Process `TRIM_GALORE (null)` terminated with an error exit status (255) -- Execution is retried (1)
[06/dd86a5] Re-submitted process > TRIM_GALORE (null)
[06/dd86a5] NOTE: Process `TRIM_GALORE (null)` terminated with an error exit status (255) -- Execution is retried (2)
[74/c4da3c] Re-submitted process > TRIM_GALORE (null)
Error executing process > 'TRIM_GALORE (null)'

Caused by:
  Process `TRIM_GALORE (null)` terminated with an error exit status (255)

Command executed:

  module load trim_galore
  module load fastqc
  trim_galore  --paired SRR1917137_GSM1635411_Ikaros_ChIP-Seq,_proB_cells_Mus_musculus_ChIP-Seq_1.fastq.gz SRR1917139_GSM1635413_Brg1_ChIP-Seq,_proB_cells_Mus_musculus_ChIP-Seq_1.fastq.gz SRR1917140_GSM1635414_Input,_proB_cells_Mus_musculus_ChIP-Seq_1.fastq.gz

Command exit status:
  255

Command output:
  (empty)

Command error:
  Multicore support not enabled. Proceeding with single-core trimming.
  Path to Cutadapt set as: 'cutadapt' (default)
  Cutadapt seems to be working fine (tested command 'cutadapt --version')
  Cutadapt version: 2.3
  single-core operation.
  No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

  Please provide an even number of input files for paired-end FastQ trimming! Aborting ...
FelixKrueger commented 3 years ago

Hmm, commas in filenames... what can possibly go wrong? Which downloading tool would ever allow this....

FelixKrueger commented 3 years ago

I can confirm that it is the comma in the filename that breaks it, replacing it with _ works just fine. So, is this a case of stop wanting to submit files with commas, or does the Nextflow pipeline have to sort this out?

s-andrews commented 3 years ago

To be fair a comma is a pretty unfriendly thing to put in a file name, but it's not a disallowed character so we should probably deal with it.

The names came from sradownloader so I'll have a look at what was going on with that too.

A fix from both ends would seem to be the appropriate fix :-)

s-andrews commented 3 years ago

OK, if you want to upgrade this to a proper bug then it still fails even with --single_end as a parameter. Trim galore still gets run in paried end mode.

The command used to launch the workflow was as follows:
nextflow /bi/apps/nextflow/nextflow_pipelines/nf_chipseq --genome GRCm38 -bg --single_end SRR1917137_GSM1635411_Ikaros_ChIP-Seq,_proB_cells_Mus_musculus_ChIP-Seq_1.fastq.gz SRR1917139_GSM1635413_Brg1_ChIP-Seq,_proB_cells_Mus_musculus_ChIP-Seq_1.fastq.gz SRR1917140_GSM1635414_Input,_proB_cells_Mus_musculus_ChIP-Seq_1.fastq.gz

Execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 255.
The full error message was:
Error executing process > 'TRIM_GALORE (null)'

Caused by:
  Process `TRIM_GALORE (null)` terminated with an error exit status (255)

Command executed:

  module load trim_galore
  module load fastqc
  trim_galore  --paired SRR1917139_GSM1635413_Brg1_ChIP-Seq,_proB_cells_Mus_musculus_ChIP-Seq_1.fastq.gz SRR1917140_GSM1635414_Input,_proB_cells_Mus_musculus_ChIP-Seq_1.fastq.gz