magicDGS / ReadTools

A Universal Toolkit for Handling Sequence Data from Different Sequencing Platforms
https://magicdgs.github.io/ReadTools/
MIT License
6 stars 3 forks source link

spurious assignment of read groups by barcodes with '+' delimiter #532

Closed AnnaMariaL closed 5 years ago

AnnaMariaL commented 5 years ago

Hi,

I obtained data where the delimiter between dual barcodes is + instead of -. I followed the recommendations in issue #512 & talked to Rupert. However, now the assignment of read groups seems spurious.

Reads with two mismatches are not filtered out, but reads with one mismatch are (-mm 1), and reads with two Ns are kept, while reads with one N are filtered out (-maxN 1).

I attached how I called readtools, the test.fq files and the outputs.

test-ReadTools.zip

Anna

magicDGS commented 5 years ago

Thanks for your feedback, but I need more information about it:

  1. What do you refer to with reads with two mistmatches? In one or the two barcodes? Mismatches are count by barcode, no by read. The same is valid for number of Ns.
  2. Your zip file contains information for -mm 0, not for -mm 1. I am not sure if this is affecting your expectations.
  3. Ns are counted as default as mismatches unless you specify --nNoMismatch.
  4. In addition to this, if a barcode helps to uniquely identify a read belonging to a RG, then it is assigned inmediately independently of the other barcode.

As a result of this, I can say that the result based on the statistics file is the expected one: 2 reads are filtered out due to 1 mismatch. The reason is the following for the ones that does not match (I haven't look at the BAMs - see below why -, so this are my expectations):

I hope that this helps you to understand the result. Please, let me know if I am not taking into account something or if the issue isn't resolved by this.

NOTE: I've not checked the BAM output but I think that with the previous information is not needed; please, next time create a SAM to be able to look at it without extra software.

AnnaMariaL commented 5 years ago

Hi Daniel,

thanks for your feedback. I try to be more precise this time.

Thanks for your feedback, but I need more information about it:

  1. What do you refer to with reads with two mismatches? In one or the two barcodes? Mismatches are count by barcode, no by read. The same is valid for number of Ns.

I meant two mismatches in one barcode. However, what I classified as spurious results is explained by your explanation in point 4.

  1. Your zip file contains information for -mm 0, not for -mm 1. I am not sure if this is affecting your expectations.

Thanks for pointing that out.

  1. Ns are counted as default as mismatches unless you specify --nNoMismatch.

Thanks for pointing that out.

  1. In addition to this, if a barcode helps to uniquely identify a read belonging to a RG, then it is assigned inmediately independently of the other barcode.

That's the crucial part. This also explains my (not shown) observations that when I switch the errors to be present in barcode#1, the filtering result changes and is not symmetrical. Can you maybe provide me with a link where to find this information in the manual? I guess this information should go onto the wikipedia page of the institute.

As a result of this, I can say that the result based on the statistics file is the expected one: 2 reads are filtered out due to 1 mismatch. The reason is the following for the ones that does not match (I haven't look at the BAMs - see below why -, so this are my expectations):

  • Filtered out @E00603:250:HYYJYCCXY:1:1101:16376:1450 1:N:0:AAAAAAAN+AAAAAAAN contains 1 mismatch in each barcode, due to the N counted as mismatch.
  • Filtered out @E00603:250:HYYJYCCXY:1:1101:16559:1450 1:N:0:TTTTTTTC+TTTTTTTC due to the same reason as above, but instead of N mismatching, in this case is a C.
  • Kept @E00603:250:HYYJYCCXY:1:1101:16924:1450 1:N:0:CCCCCCCC+CCCCCCGG and @E00603:250:HYYJYCCXY:1:1101:17777:1450 1:N:0:TTTTTTTT+TTTTTTNN, as both uniquely identify the read thanks to the first barcode.

I hope that this helps you to understand the result. Please, let me know if I am not taking into account something or if the issue isn't resolved by this.

Yes - it's helpful: Especially point number 3 and 4

NOTE: I've not checked the BAM output but I think that with the previous information is not needed; please, next time create a SAM to be able to look at it without extra software.

I will try to keep that in mind.

magicDGS commented 5 years ago

Warning after the description on the docs shows the direct assignment to uniquely identified barcodes: https://magicdgs.github.io/ReadTools/AssignReadGroupByBarcode.html

magicDGS commented 5 years ago

Can I close the ticket, @AnnaMariaL? Thank you!

AnnaMariaL commented 5 years ago

I am out of office today so I can't rerun things atm. But I guess so... Thanks for the feedback Anna

Daniel Gómez-Sánchez notifications@github.com schrieb am Di., 28. Mai 2019, 17:58:

Can I close the ticket, @AnnaMariaL https://github.com/AnnaMariaL? Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/magicDGS/ReadTools/issues/532?email_source=notifications&email_token=AEVA7GEFA5MQZQ6SQZVDKDDPXVJEDA5CNFSM4HPQVTS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMTFSY#issuecomment-496579275, or mute the thread https://github.com/notifications/unsubscribe-auth/AEVA7GDU27VKKSG4PEK4MULPXVJEDANCNFSM4HPQVTSQ .

AnnaMariaL commented 5 years ago

... If you could send me the link where to find the information about when first Barcodes are used only that would be great.

Anna Maria Langmüller annamaria.langmueller@gmail.com schrieb am Di., 28. Mai 2019, 18:01:

I am out of office today so I can't rerun things atm. But I guess so... Thanks for the feedback Anna

Daniel Gómez-Sánchez notifications@github.com schrieb am Di., 28. Mai 2019, 17:58:

Can I close the ticket, @AnnaMariaL https://github.com/AnnaMariaL? Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/magicDGS/ReadTools/issues/532?email_source=notifications&email_token=AEVA7GEFA5MQZQ6SQZVDKDDPXVJEDA5CNFSM4HPQVTS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMTFSY#issuecomment-496579275, or mute the thread https://github.com/notifications/unsubscribe-auth/AEVA7GDU27VKKSG4PEK4MULPXVJEDANCNFSM4HPQVTSQ .

magicDGS commented 5 years ago

Closed as resolved (personal communication).