wdecoster / nanofilt

Filtering and trimming of long read sequencing data
GNU General Public License v3.0
185 stars 14 forks source link

Weird peak in histogram of read length #28

Closed Kata-Pa closed 5 years ago

Kata-Pa commented 5 years ago

Hi! I have a question which might not be related to NanoFilt. I have some Nanopore Minion long reads basecalled with Flappie. I tried Nanoplot on them to check the statistics and everything looked ok. Then I did NanoFilt, as follows: cat PoreChopped2_Flappie.fastq | NanoFilt -q 7 > NanoFilt_PoreChopped2_flappie.fastq and run a nanoplot again to check if I did removed the low quality reads. Now the thing is that I have a pretty high peak in the histogram of read length. Do you have an idea if this has something to do with the filtering I applied or is it an issue of my dataset? histogram log_transformed

wdecoster commented 5 years ago

Hi, that looks like the DNA Cs (calibration strand) spiked in (optionally) during library prep. It's a part of the lambda genome. I wrote https://github.com/wdecoster/nanolyse to remove it so that's something you could try.

Cheers, Wouter

Kata-Pa commented 5 years ago

Ah you re right, I remember reading about this before , I ll have a look to the tool, thank you!