rhysnewell / aviary

A hybrid assembly and MAG recovery pipeline (and more!)
GNU General Public License v3.0
76 stars 11 forks source link

`data/short_reads.fastq.gz` is empty: test_short_read_recovery integration test fails on dev commit 51516da #170

Closed AroneyS closed 8 months ago

AroneyS commented 8 months ago

QC short reads produces empty data/short_reads.fastq.gz. Log below:

pigz: skipping: data/short_reads.pre_qc.1.fastq is a symbolic link
Combining reads before quality control
Coassemble: False
Symlinking /mnt/hpccs01/home/aroneys/src/aviary/test/data/wgsim.1.fq.gz to data/short_reads.pre_qc.1.fastq
Gzipping data/short_reads.pre_qc.1.fastq
pigz: skipping: data/short_reads.pre_qc.2.fastq is a symbolic link
Combining reads before quality control
Coassemble: False
Symlinking /mnt/hpccs01/home/aroneys/src/aviary/test/data/wgsim.2.fq.gz to data/short_reads.pre_qc.2.fastq
Gzipping data/short_reads.pre_qc.2.fastq
ERROR: Failed to open file: data/short_reads.pre_qc.1.fastq.gz
Shell style : fastp --stdout -w 32 -q 15 -u 40 -l 15 --length_limit 0 -i data/short_reads.pre_qc.1.fastq.gz -I data/short_reads.pre_qc.2.fastq.gz | pigz -p 32 > data/short_reads.fastq.gz
fastp return: 255
pigz return: 0
Not performing reference filtering: []
rhysnewell commented 8 months ago

Looks like something is going awry with the symlinking, maybe? Strange that it complains about forwards reads, but not reverse reads

wwood commented 8 months ago

seems like reverse reads also skipped @rhysnewell ? Can just add -f to pigz I think to override the symlink thing

pigz: skipping: data/short_reads.pre_qc.2.fastq is a symbolic link
rhysnewell commented 8 months ago

Oh I missed the message at the top of the error. Might be a bit more confusing as it is symlinking the file and removing the gzip extension when the file is already gzipped. Then it doesn't do any compressing, meaning the file that fastp is looking for doesn't get created. I'll look into it and a fix in https://github.com/rhysnewell/aviary/pull/168

rhysnewell commented 8 months ago

Easy enough fix: https://github.com/rhysnewell/aviary/pull/168/commits/cf828bf533bf4a535dcc77fe0d0452e34432ea83

qc_short_reads log:

───────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: retest/logs/qc_short_reads.log
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ Combining reads before quality control
   2   │ Coassemble: False
   3   │ Symlinking /home/rhys_newell/git/aviary/test/data/wgsim.1.fq.gz to data/short_reads.pre_qc.1.fastq.gz
   4   │ Combining reads before quality control
   5   │ Coassemble: False
   6   │ Symlinking /home/rhys_newell/git/aviary/test/data/wgsim.2.fq.gz to data/short_reads.pre_qc.2.fastq.gz
   7   │ Streaming uncompressed interleaved reads to STDOUT...
   8   │ Enable interleaved output mode for paired-end input.
   9   │
  10   │ Read1 before filtering:
  11   │ total reads: 81598
  12   │ total bases: 12239700
  13   │ Q20 bases: 0(0%)
  14   │ Q30 bases: 0(0%)
  15   │
  16   │ Read2 before filtering:
  17   │ total reads: 81598
  18   │ total bases: 12239700
  19   │ Q20 bases: 0(0%)
  20   │ Q30 bases: 0(0%)
  21   │
  22   │ Read1 after filtering:
  23   │ total reads: 81598
  24   │ total bases: 12239697
  25   │ Q20 bases: 0(0%)
  26   │ Q30 bases: 0(0%)
  27   │
  28   │ Read2 after filtering:
  29   │ total reads: 81598
  30   │ total bases: 12239697
  31   │ Q20 bases: 0(0%)
  32   │ Q30 bases: 0(0%)
  33   │
  34   │ Filtering result:
  35   │ reads passed filter: 163196
  36   │ reads failed due to low quality: 0
  37   │ reads failed due to too many N: 0
  38   │ reads failed due to too short: 0
  39   │ reads with adapter trimmed: 2
  40   │ bases trimmed due to adapters: 6
  41   │
  42   │ Duplication rate: 0%
  43   │
  44   │ Insert size peak (evaluated by paired-end reads): 164
  45   │
  46   │ JSON report: fastp.json
  47   │ HTML report: fastp.html
  48   │
  49   │ fastp --stdout -w 8 -q 15 -u 40 -l 15 --length_limit 0 -i data/short_reads.pre_qc.1.fastq.gz -I data/short_reads.pre_qc.2.fastq.gz
  50   │ fastp v0.23.4, time used: 2 seconds
  51   │ Shell style : fastp --stdout -w 8 -q 15 -u 40 -l 15 --length_limit 0 -i data/short_reads.pre_qc.1.fastq.gz -I data/short_reads.pre_qc.2.fast>
  52   │ fastp return: 0
  53   │ pigz return: 0
  54   │ Not performing reference filtering: []
───────┴───────────────────────────────────────────────────────────