Closed AnnaMariaL closed 5 years ago
Thanks for your feedback, but I need more information about it:
-mm 0
, not for -mm 1
. I am not sure if this is affecting your expectations.--nNoMismatch
.As a result of this, I can say that the result based on the statistics file is the expected one: 2 reads are filtered out due to 1 mismatch. The reason is the following for the ones that does not match (I haven't look at the BAMs - see below why -, so this are my expectations):
@E00603:250:HYYJYCCXY:1:1101:16376:1450 1:N:0:AAAAAAAN+AAAAAAAN
contains 1 mismatch in each barcode, due to the N counted as mismatch.@E00603:250:HYYJYCCXY:1:1101:16559:1450 1:N:0:TTTTTTTC+TTTTTTTC
due to the same reason as above, but instead of N mismatching, in this case is a C.@E00603:250:HYYJYCCXY:1:1101:16924:1450 1:N:0:CCCCCCCC+CCCCCCGG
and @E00603:250:HYYJYCCXY:1:1101:17777:1450 1:N:0:TTTTTTTT+TTTTTTNN
, as both uniquely identify the read thanks to the first barcode.I hope that this helps you to understand the result. Please, let me know if I am not taking into account something or if the issue isn't resolved by this.
NOTE: I've not checked the BAM output but I think that with the previous information is not needed; please, next time create a SAM to be able to look at it without extra software.
Hi Daniel,
thanks for your feedback. I try to be more precise this time.
Thanks for your feedback, but I need more information about it:
- What do you refer to with reads with two mismatches? In one or the two barcodes? Mismatches are count by barcode, no by read. The same is valid for number of Ns.
I meant two mismatches in one barcode. However, what I classified as spurious results is explained by your explanation in point 4.
- Your zip file contains information for
-mm 0
, not for-mm 1
. I am not sure if this is affecting your expectations.
Thanks for pointing that out.
- Ns are counted as default as mismatches unless you specify
--nNoMismatch
.
Thanks for pointing that out.
- In addition to this, if a barcode helps to uniquely identify a read belonging to a RG, then it is assigned inmediately independently of the other barcode.
That's the crucial part. This also explains my (not shown) observations that when I switch the errors to be present in barcode#1, the filtering result changes and is not symmetrical. Can you maybe provide me with a link where to find this information in the manual? I guess this information should go onto the wikipedia page of the institute.
As a result of this, I can say that the result based on the statistics file is the expected one: 2 reads are filtered out due to 1 mismatch. The reason is the following for the ones that does not match (I haven't look at the BAMs - see below why -, so this are my expectations):
- Filtered out
@E00603:250:HYYJYCCXY:1:1101:16376:1450 1:N:0:AAAAAAAN+AAAAAAAN
contains 1 mismatch in each barcode, due to the N counted as mismatch.- Filtered out
@E00603:250:HYYJYCCXY:1:1101:16559:1450 1:N:0:TTTTTTTC+TTTTTTTC
due to the same reason as above, but instead of N mismatching, in this case is a C.- Kept
@E00603:250:HYYJYCCXY:1:1101:16924:1450 1:N:0:CCCCCCCC+CCCCCCGG
and@E00603:250:HYYJYCCXY:1:1101:17777:1450 1:N:0:TTTTTTTT+TTTTTTNN
, as both uniquely identify the read thanks to the first barcode.I hope that this helps you to understand the result. Please, let me know if I am not taking into account something or if the issue isn't resolved by this.
Yes - it's helpful: Especially point number 3 and 4
NOTE: I've not checked the BAM output but I think that with the previous information is not needed; please, next time create a SAM to be able to look at it without extra software.
I will try to keep that in mind.
Warning after the description on the docs shows the direct assignment to uniquely identified barcodes: https://magicdgs.github.io/ReadTools/AssignReadGroupByBarcode.html
Can I close the ticket, @AnnaMariaL? Thank you!
I am out of office today so I can't rerun things atm. But I guess so... Thanks for the feedback Anna
Daniel Gómez-Sánchez notifications@github.com schrieb am Di., 28. Mai 2019, 17:58:
Can I close the ticket, @AnnaMariaL https://github.com/AnnaMariaL? Thank you!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/magicDGS/ReadTools/issues/532?email_source=notifications&email_token=AEVA7GEFA5MQZQ6SQZVDKDDPXVJEDA5CNFSM4HPQVTS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMTFSY#issuecomment-496579275, or mute the thread https://github.com/notifications/unsubscribe-auth/AEVA7GDU27VKKSG4PEK4MULPXVJEDANCNFSM4HPQVTSQ .
... If you could send me the link where to find the information about when first Barcodes are used only that would be great.
Anna Maria Langmüller annamaria.langmueller@gmail.com schrieb am Di., 28. Mai 2019, 18:01:
I am out of office today so I can't rerun things atm. But I guess so... Thanks for the feedback Anna
Daniel Gómez-Sánchez notifications@github.com schrieb am Di., 28. Mai 2019, 17:58:
Can I close the ticket, @AnnaMariaL https://github.com/AnnaMariaL? Thank you!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/magicDGS/ReadTools/issues/532?email_source=notifications&email_token=AEVA7GEFA5MQZQ6SQZVDKDDPXVJEDA5CNFSM4HPQVTS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMTFSY#issuecomment-496579275, or mute the thread https://github.com/notifications/unsubscribe-auth/AEVA7GDU27VKKSG4PEK4MULPXVJEDANCNFSM4HPQVTSQ .
Closed as resolved (personal communication).
Hi,
I obtained data where the delimiter between dual barcodes is + instead of -. I followed the recommendations in issue #512 & talked to Rupert. However, now the assignment of read groups seems spurious.
Reads with two mismatches are not filtered out, but reads with one mismatch are (-mm 1), and reads with two Ns are kept, while reads with one N are filtered out (-maxN 1).
I attached how I called readtools, the test.fq files and the outputs.
test-ReadTools.zip
Anna