nhoffman / dada2-nf

A Nextflow pipeline for processing 16S rRNA sequences using dada2
0 stars 2 forks source link

Handle empty reads pre error model #66

Closed dhoogest closed 1 year ago

dhoogest commented 1 year ago

@crosenth @nhoffman hacky change here to introduce a small bash script which filters out any empty fastq files when preparing the list sent to dada2_learn_errors.R. After updating the test-ITS/sample_information.csv to modify batching in accordance to amplicon type, running this branch on the test-ITS.json params successfully excludes barcode 49 when preparing error model, resulting in a non-empty model.

There might be an easy way of doing this by manipulating the nextflow file(), but I didn't take the time to figure out how to do that if in fact it is possible.

dhoogest commented 1 year ago

To elaborate a bit on testing, I inspected the specific workdir associated with barcode 49 from the test data:

dhoogest@gattaca:~/src/dada2-nf$ ls -lat work/a5/f69c229190665908f65f91635b6882/
total 560
drwxrwx--- 4 dhoogest _SEC_MOLMICRO    152 Feb 13 11:35 ..
-rw-rw---- 1 dhoogest _SEC_MOLMICRO      1 Feb 13 11:34 .exitcode
drwxrwx--- 2 dhoogest _SEC_MOLMICRO   8192 Feb 13 11:34 .
-rw-rw---- 1 dhoogest _SEC_MOLMICRO    668 Feb 13 11:34 .command.log
-rw-rw---- 1 dhoogest _SEC_MOLMICRO    360 Feb 13 11:34 .command.out
-rw-rw---- 1 dhoogest _SEC_MOLMICRO    308 Feb 13 11:34 .command.err
-rw-r--r-- 1 dhoogest domain^users   67550 Feb 13 11:34 Rplots.pdf
-rw-r--r-- 1 dhoogest domain^users  388620 Feb 13 11:34 error_model_1_reverse.png
-rw-r--r-- 1 dhoogest domain^users   18307 Feb 13 11:34 error_model_1_reverse.rds
-rw-r--r-- 1 dhoogest domain^users      42 Feb 13 11:34 R2.txt
-rw-r--r-- 1 dhoogest domain^users      42 Feb 13 11:34 R1.txt
lrwxrwxrwx 1 dhoogest _SEC_MOLMICRO    100 Feb 13 11:34 R2_4.fastq.gz -> /mnt/home/dhoogest/src/dada2-nf/work/c6/c34f9881801af5dc2938bad0bfde1d/22R255-NGSITS58_R2_filt.fq.gz
lrwxrwxrwx 1 dhoogest _SEC_MOLMICRO    100 Feb 13 11:34 R2_3.fastq.gz -> /mnt/home/dhoogest/src/dada2-nf/work/11/ff620088e25d2c4a5aa05dc9aa1406/22R255-NGSITS60_R2_filt.fq.gz
lrwxrwxrwx 1 dhoogest _SEC_MOLMICRO    100 Feb 13 11:34 R2_2.fastq.gz -> /mnt/home/dhoogest/src/dada2-nf/work/cf/aa50a4aeed619868ff7748c57a9eba/22R255-NGSITS49_R2_filt.fq.gz
lrwxrwxrwx 1 dhoogest _SEC_MOLMICRO    100 Feb 13 11:34 R2_1.fastq.gz -> /mnt/home/dhoogest/src/dada2-nf/work/76/c428ef970c554bb25b59b0c91d521d/22R255-NGSITS59_R2_filt.fq.gz
lrwxrwxrwx 1 dhoogest _SEC_MOLMICRO    100 Feb 13 11:34 R1_4.fastq.gz -> /mnt/home/dhoogest/src/dada2-nf/work/c6/c34f9881801af5dc2938bad0bfde1d/22R255-NGSITS58_R1_filt.fq.gz
lrwxrwxrwx 1 dhoogest _SEC_MOLMICRO    100 Feb 13 11:34 R1_3.fastq.gz -> /mnt/home/dhoogest/src/dada2-nf/work/11/ff620088e25d2c4a5aa05dc9aa1406/22R255-NGSITS60_R1_filt.fq.gz
lrwxrwxrwx 1 dhoogest _SEC_MOLMICRO    100 Feb 13 11:34 R1_2.fastq.gz -> /mnt/home/dhoogest/src/dada2-nf/work/cf/aa50a4aeed619868ff7748c57a9eba/22R255-NGSITS49_R1_filt.fq.gz
-rw-rw---- 1 dhoogest _SEC_MOLMICRO      0 Feb 13 11:34 .command.begin
lrwxrwxrwx 1 dhoogest _SEC_MOLMICRO    100 Feb 13 11:34 R1_1.fastq.gz -> /mnt/home/dhoogest/src/dada2-nf/work/76/c428ef970c554bb25b59b0c91d521d/22R255-NGSITS59_R1_filt.fq.gz
-rw-rw---- 1 dhoogest _SEC_MOLMICRO    235 Feb 13 11:34 .command.sh
-rw-rw---- 1 dhoogest _SEC_MOLMICRO   4567 Feb 13 11:34 .command.run

The R1.txt and R2.txt files successfully exclude the barcode 49 files, and the output files (png, rds) are contentful instead of empty.