merenlab / illumina-utils

A library and collection of scripts to work with Illumina paired-end data (for CASAVA 1.7+ pipeline).
GNU General Public License v2.0
89 stars 31 forks source link

Help interpreting iu-filter-quality-minoche crash #17

Closed jarrodscott closed 6 years ago

jarrodscott commented 6 years ago

I am hoping someone can help me interpret this error message. I am using iu-filter-quality-minoche for some NextSeq metagenomes (4xR1, 4XR2 per sample) that i ran through trimmomatic. For most of the metagenomes I can run iu-filter-quality-minoche just fine but a few always fail at the same point in the processing. For this sample it is around 24,000 reads in. If I run iu-filter-quality-minoche on the raw data before trimmomatic I have no issues. So it seems trimmomatic is doing something to a read that is causing iu-filter-quality-minoche to crash. But I don't know how to interpret the error and thus troubleshoot the problem.

(num pairs processed: 23,000) (num pairs processed: 24,000) Traceback (most recent call last): File "/miniconda3/bin/iu-filter-quality-minoche", line 313, in <module> sys.exit(main(config, args)) File "/miniconda3/bin/iu-filter-quality-minoche", line 178, in main p1_passed_qual, p1_trim_to, p1_fate = IsHighQuality(s1, q1, p) File "/miniconda3/bin/iu-filter-quality-minoche", line 68, in IsHighQuality trim_to = None if len(sequence) == trim_to else trim_to UnboundLocalError: local variable 'trim_to' referenced before assignment

meren commented 6 years ago

Hey Jarrod, which version is this? I thought I fixed that bug awhile ago.

jarrodscott commented 6 years ago

Hi Meren,

should be the latest one--I installed it on our server the other day. Unfortunately the server is in maintenance for two hours so I can't tell you the exact version right now...

jarrodscott commented 6 years ago

I did the standard install, pip install illumina-utils in conda.

meren commented 6 years ago

Oh. never mind. So it is new. I will look into this soon.

jarrodscott commented 6 years ago

Within the last two weeks. I will try a fresh install and see what I come up with. At least now I have a place to start :) thanks.

meren commented 6 years ago

Can you by any chance send me that file so I can make sure I am properly fixing it?

Looking at the code, I can tell there must be something definitely wrong with your file. Probably due to something beyond your control. But it shouldn't kill the process.

meren commented 6 years ago

You can simply do something like this:

head -n 100000 YOUR_R1 | tail -n 20000 > R1_TO_MEREN
head -n 100000 YOUR_R2 | tail -n 20000 > R2_TO_MEREN
gzip R1_TO_MEREN R2_TO_MEREN

And this should be largely enough since it will include the problem sequence and not too much of the rest of the file :)

meren commented 6 years ago

This may sound a bit silly, but I am not sure how useful to trim short reads of shotgun metagenomes, especially when they will go through an assembly step. Most modern assemblers will know how to deal with sequences that offer some ridiculous k-mer's that don't fit anywhere in the graph, and some trimmed sequences will not have the length to contribute larger k's, while they will contribute to shorter ones. Removing any read that requires extensive trimming may be a better strategy after all, and minoche somewhat does that. This of course doesn't mean that the bug you run into should not be fixed :p It is on me.

jarrodscott commented 6 years ago

FYI Illumina-utils v2.3

jarrodscott commented 6 years ago

files sent. let me know if you don't get it.

jarrodscott commented 6 years ago

This seems like the offending line in the trimmed file. It was in the *L001_R1* fastq:

@NS500709:29:HMTGYBGX5:1:11102:10264:19603 1:N:0:CGGAGCCT+CTCCTTAC G + A After removing this everything worked a-ok. This read was one read after the output was terminated

meren commented 6 years ago

Excellent! Thank you very much, this will be enough information for me in fact :) I will submit a fix for this.