Closed rojinsafavi closed 3 years ago
The default is to look for a read tag called "CB" containing the cell barcode information (see the docs). I don't see a CB tag in your bam file, it looks like it might be encoded in the read name. You can use the --barcode_regex
argument to pass a regular expression to extract the cell barcode from the read name
Thanks Tim, Yes you are correct. the barcode is here: "7001113:991:HVWNKBCX2:1:2110:2683:18182:87:93:72:36:CAGGTATGGC"
May I ask how I should pass that to the --barcode_regex ?
I think my barcodes are a bit weird. I modified the sam file and made new barcodes and I think it is fine now. I will let you know if I have any further questions. Thank you.
I have the same issue. I want to generate a fragment file from a paired-end aligned BAM file. My read name has the barcode and so I used "[^:]*" regex. However, the output was an empty BED file. Below is the BAM file that I am using. Can you help me with this issue?
CTGAAGCTAGGCAGAATCTCTCCGGTACT:7001113:901:HGFWTBCX2:2:2207:15570:18734/2 147 chr1 10276 37 4M1D46M 0 0 GGGTTGGGGTTGGGGTTGGGGTTGGGGTTGGGGTTGGGGTTAGGGTAGGG DDDDDHIIIIHIIIIIIHIIIHHHIIIHDHIIHIIHII<FC1CGEH1CGH NM:i:1 MD:Z:4^A46 AGCGATAGCGTACTAGAAGGAGTAGGTTG:7001113:901:HGFWTBCX2:2:1114:16162:74492/1 99 chr1 10330 38 50M 0 0 GCCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTAACCCTAACC DDDDBH@FHHCEHH11<CFHIIE1FGH@DECHHE1<D@F11<DFHE11DH NM:i:1 MD:Z:C49 CTATTAGGTAGCGCTCCCTAGAGTGGCTC:7001113:901:HGFWTBCX2:2:1205:21004:29651/2 163 chr1 10434 45 50M 0 0 CNCTAACCCCGAACCCTAACCCTAACCCTAACCCTCGCGGTACCCTCAGC 0#<<<D1C1<CCHFHHGHHHIIIIIIIIGHIHHH@C?HDD?FEHHFHHH? NM:i:2 MD:Z:1C8T39 AGACACCTACTCGCTAGTAAGGAGGTACT:7001113:901:HGFWTBCX2:2:1106:1952:11878/1 83 chr1 10492 44 50M 0 0 GTGGTACTCTGAAGGCGGAGCACAGTTCTCCTCAGGTCAGACCCGGGCGDDDDDIIGIHIFFHIIGIIHGHIHHHHHHHIIHHHHHGIGIHHIHIIIIC NM:i:0 MD:Z:50 CTATTAGGTAGCGCTCCCTAGAGTGGCTC:7001113:901:HGFWTBCX2:2:1205:21004:29651/1 83 chr1 10550 45 50M 0 0 GCACAGACCCGGAGAGCACCGCGAGGGCGGAGCTGCGTTGTCCTCTGCAC DDBDBHHH@FHHIHHHHIFHIIIIIIGIIIDEHHHIHHHIIIIIIIHIIH NM:i:0 MD:Z:50 AGCGATAGCGATCAGTAAGGAGTACAGGA:7001113:901:HGFWTBCX2:2:1112:6627:33341/2 163 chr1 17465 42 50M 0 0 GTCCCAGGCCTCCCGAGCCGAGCCACCCGTCACCCCCTGGCTCCTGGCCDDDDDIIIIIIIIIIHHIIIIIIIIIIIIHIIIIIIIIIIIHIIIIIIII NM:i:0 MD:Z:50 AGACACCTCTCTCTACGCGTAAGACCTAT:7001113:901:HGFWTBCX2:2:1107:15505:39683/1 99 chr1 17471 42 50M 0 0 GGCCTCCCGAGCCGAGCCACCCGTCACCCCCTGGCTCCTGGCCTATGTGC DDDDDIIIIGIIIIIIIIIIIIIIIIIIIIIIHHIIGIIIIIIIIIIIII NM:i:0 MD:Z:50 AGACACCTCTCTCTACGCGTAAGACCTAT:7001113:901:HGFWTBCX2:2:2112:12312:15153/2 147 chr1 17623 42 43M1D7M 0 0 CTTGGGGTCTTCCCAGCAACATCAGCTCTGTCAGCTCCTTGCTGCTCTTC DDADCIIIIIIIIIIIHHGHIIIIIIIIIIIIIIIIHIIIIIIIIIIIII NM:i:1 MD:Z:43^C7 AGACACCTCTCTCTACGCGTAAGACCTAT:7001113:901:HGFWTBCX2:2:1107:15505:39683/2 147 chr1 17624 42 50M 0 0 CTTGGGGGTCTTCCCAGCAACATCAGCTCTGTCAGCTCCTTGCTGCTCTT DDDDDIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIII NM:i:0 MD:Z:50 AGCGATAGCGATCAGTAAGGAGTACAGGA:7001113:901:HGFWTBCX2:2:1112:6627:33341/1 83 chr1 17746 42 50M 0 0 ATTAGGCCCAGCACCAAATATTCACCATCCCTTGGCCATCCTGGCCCTCDDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIH NM:i:0 MD:Z:50
I think my barcodes are a bit weird. I modified the sam file and made new barcodes and I think it is fine now. I will let you know if I have any further questions. Thank you.
Stumbled across this thread. For future reference, playing around with python regex flavor came up with this to extract your barcode:
"[0-9]+:+[A-Z]*"
Explanation: Some digit followed by a colon followed by some string of capital letters.
There are several web-based regex testers that let you plug in your test string to see what regex works.
Hello,
I tried running sinto on a non 10X ATACseq data, and the fragment file is empty. I was wondering if you could lep me with that?
here is the output for
samtools view merged.bam | head
and this is the code I ran: sinto fragments -b merged.bam -f fragment