nhoffman / dada2-nf

A Nextflow pipeline for processing 16S rRNA sequences using dada2
0 stars 2 forks source link

an outright empty fastq errors in the plot_quality.R phase #75

Open dhoogest opened 1 year ago

dhoogest commented 1 year ago

Hit this today (and am surprised we haven't seen a problem previously):

N E X T F L O W  ~  version 22.04.1                                                                                                                                                                                                                  Pulling nhoffman/dada2-nf ...                                                                                                                                                                                                                         Already-up-to-date                                                                                                                                                                                                                                  WARN: It appears you have never run this project before -- Option `-resume` is ignored                                                                                                                                                               Launching `https://github.com/nhoffman/dada2-nf` [elated_agnesi] DSL1 - revision: a91413a846ccf14dedb2e7d603cd5cc861b38a6f                                                                                                                           executor >  local (96)                                                                                                                                                                                                                               executor >  local (96)                                                                                                                                                                                                                               [80/3f537c] process > copy_filelist        [100%] 1 of 1 ✔                                                                                                                                                                                           [7c/b80f8d] process > read_manifest (1)    [100%] 1 of 1 ✔                                                                                                                                                                                           [c3/7ae148] process > plot_quality (16)    [ 59%] 13 of 22, failed: 1                                                                                                                                                                                [bf/1581b8] process > barcodecop_dual (10) [100%] 22 of 22 ✔                                                                                                                                                                                         [6c/1688fa] process > cutadapt (21)        [100%] 22 of 22 ✔                                                                                                                                                                                         [21/4de102] process > vsearch_split (12)   [ 57%] 12 of 21                                                                                                                                                                                           [4e/59a89d] process > filter_and_trim (8)  [ 66%] 8 of 12                                                                                                                                                                                            [-        ] process > learn_errors         -                                                                                                                                                                                                         [-        ] process > dada_dereplicate     -                                                                                                                                                                                                         [-        ] process > combined_overlaps    -                                                                                                                                                                                                         [-        ] process > cluster_svs          -                                                                                                                                                                                                         [-        ] process > combine_svs          -                                                                                                                                                                                                         [-        ] process > write_seqs           -                                                                                                                                                                                                         [-        ] process > join_counts          -                                                                                                                                                                                                         [fe/7b40f1] process > save_params          [100%] 1 of 1 ✔                                                                                                                                                                                           Error executing process > 'plot_quality (16)'                                                                                                                                                                                                                                                                                                                                                                                                                                                             Caused by:                                                                                                                                                                                                                                             Process `plot_quality (16)` terminated with an error exit status (1)

Command executed:

  dada2_plot_quality.R input.1 input.2 --params dada_params.json -o 23R157-U074.png

Command exit status:
  1

Command output:
  (empty)

Command error:

  gzip: input.1: unexpected end of file
  Error in if (gzip_size(args$r1) == 0) { :
    missing value where TRUE/FALSE needed
  Calls: main
  In addition: Warning message:
  In system2("gunzip", c("-l", fname), stdout = TRUE) :
    running command ''gunzip' -l input.1' had status 1
  Execution halted

Work dir:
  /mnt/disk2/molmicro/projects/universal_NGS/TakaraEval/23N0208_SRSITS/work/c3/7ae1480a61e62fdc52e318f6357bcf

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Sample 23R157-U074 has completely empty fastq.gz files and is presumably failing the acknowledged fragile logic here:

(clampi-env) dhoogest@gattaca:/molmicro/projects/universal_NGS/TakaraEval/23N0208_SRSITS$ ls -lat /mnt/molmicro-data/illumina_data/23R157/*U074*
-rwxr-x--- 1 root _SEC_MOLMICRO 20 Apr 30 06:03 /mnt/molmicro-data/illumina_data/23R157/23R157-U074_S32_L001_R2_001.fastq.gz
-rwxr-x--- 1 root _SEC_MOLMICRO 20 Apr 30 06:03 /mnt/molmicro-data/illumina_data/23R157/23R157-U074_S32_L001_R1_001.fastq.gz
-rwxr-x--- 1 root _SEC_MOLMICRO 20 Apr 30 05:57 /mnt/molmicro-data/illumina_data/23R157/23R157-U074_S32_L001_I2_001.fastq.gz
-rwxr-x--- 1 root _SEC_MOLMICRO 20 Apr 30 05:57 /mnt/molmicro-data/illumina_data/23R157/23R157-U074_S32_L001_I1_001.fastq.gz

@nhoffman any ideas at a glance to handle the error when attempting to gunzip an empty file?

dhoogest commented 1 year ago

Weirdly I can't seem to reproduce this on the test/fastq/ data (which contains an empty read set) via

dhoogest@gattaca:~/src/dada2-nf$ ./nextflow ./main.nf -params-file params.json

The error above is not seen, however there does appear to be an error in the join_counts step:

N E X T F L O W  ~  version 22.04.3
Launching `./main.nf` [tiny_perlman] DSL1 - revision: 91de0426f7
executor >  local (54)
[69/db55cc] process > copy_filelist        [100%] 1 of 1 ✔
[0d/772e39] process > read_manifest (1)    [100%] 1 of 1 ✔
[03/293f84] process > plot_quality (2)     [100%] 10 of 10, cached: 2 ✔
[97/458525] process > barcodecop_dual (3)  [100%] 10 of 10, cached: 2 ✔
[fa/361c7b] process > no_cutadapt (10)     [100%] 10 of 10, cached: 2 ✔
[b2/bacac1] process > cm_split (6)         [100%] 8 of 8, cached: 2 ✔
[46/c992e8] process > filter_and_trim (8)  [100%] 8 of 8, cached: 2 ✔
[eb/b33534] process > learn_errors (2)     [100%] 2 of 2 ✔
[6b/02b51b] process > dada_dereplicate (8) [100%] 8 of 8 ✔
[4a/6cb707] process > combined_overlaps    [100%] 1 of 1 ✔
[bf/339a5c] process > write_seqs (3)       [100%] 3 of 3 ✔
[35/f7b5fa] process > join_counts (1)      [100%] 1 of 1, failed: 1 ✘
[ae/bd7b4f] process > save_params          [100%] 1 of 1 ✔
Error executing process > 'join_counts (1)'

Caused by:
  Process `join_counts (1)` terminated with an error exit status (1)

Command executed:

  xsv cat rows --output cutadapt.csv  cutadapt_*.csv
  xsv cat rows --no-headers --output split.csv split_*.csv
  xsv cat rows --no-headers --output bcop.csv bcop_*.csv
  xsv cat rows --output dada.csv dada_*.csv
  xsv cat rows --no-headers --output specimens.csv specimen_counts_*.csv
  counts.py --out counts.csv raw.csv -1 cutadapt.csv split.csv bcop.csv dada.csv specimens.csv

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/mnt/home/dhoogest/src/dada2-nf/bin/counts.py", line 142, in <module>
      sys.exit(main(sys.argv[1:]))
    File "/mnt/home/dhoogest/src/dada2-nf/bin/counts.py", line 42, in main
      rows.extend(process_rows(raw, yld, 'raw', 'count'))
    File "/mnt/home/dhoogest/src/dada2-nf/bin/counts.py", line 136, in process_rows
      'yield': int(r[count]) / yld[si],
  ZeroDivisionError: division by zero

Work dir:
  /mnt/home/dhoogest/src/dada2-nf/work/35/f7b5fa22451bc2c34269094468195a

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

/cc @crosenth

dhoogest commented 1 year ago

I'm not totally sure what's up here, but don't think we need to solve this prior to the next release. I haven't been able to reproduce cleanly since the initial analysis error reported above.

dhoogest commented 1 year ago

Here's a little change which deals with the divide by zero error above however: https://github.com/nhoffman/dada2-nf/pull/76