immcantation / presto

pRESTO is part of the Immcantation analysis framework for Adaptive Immune Receptor Repertoire sequencing (AIRR-seq). pRESTO is a bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
https://presto.readthedocs.io
GNU Affero General Public License v3.0
0 stars 0 forks source link

Read quality = 0 stops FilterSeq #70

Closed ssnn-airr closed 4 years ago

ssnn-airr commented 4 years ago

Original report by Carolina Monzó (Bitbucket: [Carolina Monzó](https://bitbucket.org/Carolina Monzó), ).


Hi,

I’m using FilterSeq on some very bad quality .fastq files, and when reads are fully failed (all N), it stops working since the quality of the whole read is 0.

Example read:

@SN863:625:H5M7YBCX3:1:1101:1036:5108 2:N:0:CGATGTTTATCT

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

+

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Error:

$ python3.7 presto-0.5.13/bin/FilterSeq.py quality --inner -q 25 --failed --outdir ./data/fastq_trimmed/ -s ./data/fastq_raw/4256_A_run624_CGATGTTTGGGG_S4_L001_R2_001.fastq

START> FilterSeq

COMMAND> quality

FILE> 4256_A_run624_CGATGTTTGGGG_S4_L001_R2_001.fastq

INNER> True

MIN_QUAL> 25.0

NPROC> 12

PROGRESS> 11:47:42 | | 0% ( 0) 0.0 minPID 92134> Error in sibling process detected. Cleaning up.

ERROR> Error processing sequence with ID: SN863:625:H5M7YBCX3:1:1101:1036:5108.

PID 92121> Error in sibling process detected. Cleaning up.

Process Process-8:

Traceback (most recent call last):

File "/Users/CMonzo/.conda/envs/MPI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap

self.run()

File "/Users/CMonzo/.conda/envs/MPI/lib/python3.7/multiprocessing/process.py", line 99, in run

self._target(*self._args, **self._kwargs)

File "/Users/CMonzo/.conda/envs/MPI/lib/python3.7/site-packages/presto/Multiprocessing.py", line 402, in processSeqQueue

result = process_func(data, **process_args)

File "/Users/CMonzo/.conda/envs/MPI/lib/python3.7/site-packages/presto/Sequence.py", line 1289, in filterQuality

q = sum(quals) / len(quals)

ZeroDivisionError: division by zero

ssnn-airr commented 4 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


Done in 315c82f.

ssnn-airr commented 4 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


Thanks for reporting this. We’ll take a look. This looks easy to fix.

Until we post a fix, I suspect you can get these files to run through by first running them through FilterSeq.py missing to remove everything with a lot (all) Ns.