rrwick / Porechop

adapter trimmer for Oxford Nanopore reads
GNU General Public License v3.0
335 stars 125 forks source link

Rapid barcodes #9

Closed osilander closed 7 years ago

osilander commented 7 years ago

As far as I can tell the rapid barcodes differ slightly in their sequence from the native barcodes (although this post seems to contradict this slightly https://community.nanoporetech.com/posts/rapid-barcoding-analysi#comment_5049). Assuming they do differ (and they are available somewhere), can you add those in?

rrwick commented 7 years ago

Yes, I'd like to add those, but I too am confused about whether they were the same or different. What do you mean by 'as far as I can tell'? Have you tried both and see a difference?

There's some more discussion about it here: https://github.com/rrwick/Porechop/issues/7

As I said in that issue, I'd really like some hands-on experience with PCR barcoded reads (I've never used them myself). Do you have some that you could share? A full read set wouldn't be necessary - just a modest subset. Or else do you know of any publicly available ones?

Ryan

osilander commented 7 years ago

Hi Ryan,

I just have Rapid and the Native barcode read sets. Porechop works great on the native, with 100% matches given for all in your initial "what barcodes are in your sample" stage of the porechop pipeline. Strangely, when I run porechop on rapid barcode samples, I only get ~70-75% matches at this stage. That's what I mean by "as far as I can tell").

Epi2Me works fine to demultiplex rapid barcodes (and native), but it doesn't have the options porechop has, and relatively, is completely opaque. The default options in porechop also classify a lot more of the reads than Epi2Me. I should say I haven't messed with the porechop options that much.

Cheers, Olin

On Wed, Apr 5, 2017 at 12:25 PM, Ryan Wick notifications@github.com wrote:

Yes, I'd like to add those, but I too am confused about whether they were the same or different. What do you mean by 'as far as I can tell'? Have you tried both and see a difference?

There's some more discussion about it here: #7 https://github.com/rrwick/Porechop/issues/7

As I said in that issue, I'd really like some hands-on experience with PCR barcoded reads (I've never used them myself). Do you have some that you could share? A full read set wouldn't be necessary - just a modest subset. Or else do you know of any publicly available ones?

Ryan

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rrwick/Porechop/issues/9#issuecomment-291689063, or mute the thread https://github.com/notifications/unsubscribe-auth/AYSkytZ_W0dbt4x8YPWFMzwMgkyFnjl8ks5rst9dgaJpZM4Mzjm2 .

--

Olin Silander Senior Lecturer Institute of Natural and Mathematical Sciences Massey University Auckland (Oteha Rohe) Private Bag 102904 North Shore 0745 New Zealand +64 (0)9 414 0800 ext. 43618

rrwick commented 7 years ago

Okay, that makes sense. What I'd really like to know is whether or not the PCR barcodes have any additional sequence on each end. For example, this page states that the NB01 barcode is AAGAAAGTTGTCGGTGTCTTTGTG, but the full ligated sequence has 7 additional bp on each end: GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCT. These are common to all native barcodes.

Presumably the PCR barcodes do not have these same 7 bp on each end, or else you'd get 100% matches, not 70% matches. But do they have some extra sequence on the start/end, common to all PCR barcodes but different from the native barcodes? That would be useful so Porechop could distinguish between the two.

Do you have a PCR barcoded read set you could share with me, for development purposes?

Thanks, Ryan

rrwick commented 7 years ago

Hi Olin,

I looked around on the web and found this set of PCR barcoded Nanopore reads. They're a bit older, but seem to match the expected sequences.

What I found surprised me a bit! They are the exact same sequences as the native barcodes, but reversed!

For example, NB01 (native barcoding): start of read: AGGTTAACACAAAGACACCGACAACTTTCTTCAGCACC end of read: GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCT

And in BC01 (PCR barcoding) it's opposite: start of read: GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCT end of read: AGGTTAACACAAAGACACCGACAACTTTCTTCAGCACC

If this is consistently true, then it simplifies things - native and PCR barcodes are easy to tell apart. I've added the PCR barcodes to Porechop, so can you give it a try on your reads? Just pull a fresh version of Porechop from GitHub. If it works on your reads as well as the older reads, I'll be more confident that I've got this all straight.

Ryan

osilander commented 7 years ago

Thanks. I didn't emphasize that what I'm trying is the rapid barcode (RBK), which I think are different from both the native and the PCR barcodes (although this post seems to contradict this slightly https://community.nanoporetech.com/posts/rapid-barcoding-analysi#comment_5049). Fastq file of 4000 RBK reads attached. Also attached output of updated porechop on the full dataset. Note approx half of the reads match SQK-NSK007_Y.

1_out.fastq.zip porechop_Apr5_RBK_matches.txt

osilander commented 7 years ago

So the statement from ONT is that the native and rapid barcodes are the same https://community.nanoporetech.com/posts/rapid-barcoding-analysi#comment_5049 . I'm not sure why porechop doesn't find good matches when using the rapid barcode kit. Also - I forgot to add yesterday that the barcodes in the dataset I posted are only BC1 - BC6.

-Olin

rrwick commented 7 years ago

Hi Olin,

Sorry, I was indeed confusing Rapid with PCR barcodes. So I guess Nanopore has 3 barcode types? Native, rapid and PCR?

Thank you for the reads - very helpful! I have tentatively added support for rapid barcoding based on what I found. I say 'tentatively' because of two things:

I'll inquire with Nanopore about these points. In the mean time, give it a try and let me know how it goes!

Ryan

johnomics commented 7 years ago

Hi Ryan,

I'm also seeing TTTATCGTGAAACGCTTTCGCGTTTTCGTGC in our rapid runs and have just asked ONT about it. They say it is related to the transposase and will be present in all Rapid Kit library preps. It would be great if Porechop could trim these sequences as well.

rrwick commented 7 years ago

Hi Olin and John,

I've just made a new release of Porechop which contains the full Rapid Kit adapter. It should also be a bit better with barcode binning, regardless of the kit.

Give it a try! I'll close this issue now, but please let me know if anything's not working.

Ryan