rrwick / Porechop

adapter trimmer for Oxford Nanopore reads
GNU General Public License v3.0
331 stars 124 forks source link

Add hairpin adapters to detected sequences #1

Closed gringer closed 7 years ago

gringer commented 7 years ago
>Hairpin_1
CGTTCTGTTTATGTTTCTTGGACACTGATTGACACGGTTTAGTAGAAC
>Hairpin_2
CAAGAAACATAAACAGAACGT
>adapt_Y_top_1
GGCGTCTGCTTGGGTGTTTAACCTTTTTTTTTT
>adapt_Y_top_2
AATGTACTTCGTTCAGTTACGTATTGCT
>adapt_Y_bottom
GCAATACGTAACTGAACGAAGT

The nice thing about adding hairpin adapters is that you could then do hairpin detection and [possibly automatic] consensus calling all in base space

rrwick commented 7 years ago

Yes, that's definitely a good idea!

The difficultly is that Porechop currently only uses the start/end of reads to see which adapter sets are present (to save time). But since hairpin sequences will probably only be in the middle of reads, I'll need to scan for those more thoroughly. It's not a big problem, but it will be a little bit more involved than just adding the hairpin to the list of known adapters.

Also, I'm not that interested in 2D sequencing, so my personal bias on this front has made me ignore the hairpin for now...

gringer commented 7 years ago

You can get adapter sequences in the middle of chimeric reads as well (which my analysis suggests could happen at a frequency of up to 5%). I think scanning the entire sequence for the detected sequences would be a good idea.

rrwick commented 7 years ago

Yes, Porechop does look for adapter sequences in the middle of reads and split read accordingly. But it only looks for ones that it found near the ends of reads. This is to save time: checking the ends of reads is fast, so it's pretty quick to determine which adapter sets are/aren't present. Then when we get to the middle-of-read adapter check (which is slower), we only need to look for the adapter sets we've found, not all adapter sets. The problem with the hairpin is that we probably won't find it near the ends, so it would need a special whole-read check to see if it's present.

Again, it's not that big of deal and is very doable. But it's dropped on my priority list because I don't use 2D reads myself and it seems like Nanopore is moving away from them as well. Hairpin-free 1D^2 reads seem to be their new approach going forward.

rrwick commented 7 years ago

Since 2D kits are being discontinued shortly, I think I'm going to scrap the idea of adding hairpin detection to Porechop. So I'm going to close this issue now. Sorry!

I'm more interested in seeing how Porechop behaves with 1D^2 data. I don't anticipate any issues, but we'll see...