rrwick / Porechop

adapter trimmer for Oxford Nanopore reads
GNU General Public License v3.0
322 stars 123 forks source link

Low barcode assignment % #25

Closed cmaggior closed 6 years ago

cmaggior commented 6 years ago

Hello,

I'm trying to run a fasta file produced from the SQK-RLB001 kit and am getting nearly half of my sequences in the 'none' file, even after lowering the barcode threshold to 20. Is this something other users have experienced? I know MinION data isn't exactly error-free. Have the barcodes in the known adapters been updated to the new kit releases?

My apologies if this is a silly question. Thanks for making a great open source program, Ryan!

rrwick commented 6 years ago

Hi Catherine,

It is often a challenge to get all of the actual adapter sequences from Oxford Nanopore, but since they seem to use the same barcode sequences in each kit, I suspect Porechop would be compatible with SQK-RLB001.

Half of the reads being unclassified doesn't sound too crazy to me. I've seen about 1/3 in the past. Another setting which might help is --barcode_diff. The default is 5, which means the bet barcode identity must be 5 better than the second best, or else the read is too-close-to-call and is unclassified. You could try a very lenient --barcode_diff 1 to see if that helps.

Also as a sanity check - see what Albacore does. If you basecall the reads with Albacore's --barcoding option, how many does it fail to classify? In my experience Porechop's default settings are a bit more strict, so I'd expect Albacore to classify a bit more. But if they are in the same ballpark, then I can feel confident that Porechop's not doing anything terribly wrong.

Let me know how you go! Ryan

cmaggior commented 6 years ago

Hi Ryan,

Thanks so much for your response! I'll give lowering the --barcode_diff threshold a shot.

For what it's worth, basecalling with Albacore produced pretty much the same amount of reads classified, so I'm sure it's not Porechop that's the problem. I've had issues with MinION data in the past, but this is the first time I've used their barcoding kits, so I wasn't sure what was standard for users.

Thanks again, and I'll let you know how --barcode_diff <5 goes.

On Mon, Aug 28, 2017 at 4:16 AM, Ryan Wick notifications@github.com wrote:

Hi Catherine,

It is often a challenge to get all of the actual adapter sequences from Oxford Nanopore, but since they seem to use the same barcode sequences in each kit, I suspect Porechop would be compatible with SQK-RLB001.

Half of the reads being unclassified doesn't sound too crazy to me. I've seen about 1/3 in the past. Another setting which might help is --barcode_diff. The default is 5, which means the bet barcode identity must be 5 better than the second best, or else the read is too-close-to-call and is unclassified. You could try a very lenient --barcode_diff 1 to see if that helps.

Also as a sanity check - see what Albacore does. If you basecall the reads with Albacore's --barcoding option, how many does it fail to classify? In my experience Porechop's default settings are a bit more strict, so I'd expect Albacore to classify a bit more. But if they are in the same ballpark, then I can feel confident that Porechop's not doing anything terribly wrong.

Let me know how you go! Ryan

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rrwick/Porechop/issues/25#issuecomment-325290564, or mute the thread https://github.com/notifications/unsubscribe-auth/AdalwcTPxBaUvZHyhVOq6uSGmb7v9Ibaks5scndpgaJpZM4OxUmW .

rrwick commented 6 years ago

Sounds good - thanks!