timoast / sinto

Tools for single-cell data processing
https://timoast.github.io/sinto/
MIT License
112 stars 24 forks source link

no fragments produced #21

Closed rojinsafavi closed 3 years ago

rojinsafavi commented 3 years ago

Hello,

I tried running sinto on a non 10X ATACseq data, and the fragment file is empty. I was wondering if you could lep me with that?

here is the output for samtools view merged.bam | head

(signac_env) -bash-4.2$ samtools view AdultCTX_DNA_merge.bam | head
7001113:989:HTKVHBCX2:2:1105:15600:5528:77:15:82:15:CGGTATTTGG  0   chr1    3000049 1   23M *   0   0   TCTTTGAAGGTCTGGTAGAACTC DDDDDIIIIIIIIIIIIIIIIII AS:i:0
7001113:991:HVWNKBCX2:1:2110:2683:18182:87:93:72:36:CAGGTATGGC  0   chr1    3000132 39  53M *   0   0   GACTATTGATGACTGCCTCTATTTCTTTAGGGGAAATGGGACTTTTAGTCCAT   DDDDCHIIHGFHIIIIIIIIIIHHHEHIIIIIIIDDGHHHDHHIIIIIIIIHI   AS:i:0
7001113:990:HTKL3BCX2:2:2111:14944:4116:87:93:72:36:CAGGTATGGC  0   chr1    3000134 38  52M *   0   0   CTATTGATGACTGCCTCTATTTCTTTAGGGGAAATGGGACTTTTAGTCCATG    DDDDDIIIIIIIIIIIIIIIIIIIGIIIIIIIDFHHHHHIIIIIIIIIHIII    AS:i:0
7001113:991:HVWNKBCX2:2:2105:19568:57493:53:23:77:35:CACTATTTTG 0   chr1    3000159 0   52M *   0   0   GGGGGGGCATGGGACTTTTAGTCCATGAATCTGATCCTGATTTAGCTTTGGT    DDDDDIIIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIIIHEHIIHGIIIIIH    AS:i:-22
7001113:993:HVWMKBCX2:2:1211:1360:3780:53:23:77:35:CACTATTTTG   0   chr1    3000353 37  53M *   0   0   GTTAATTATAGTACAGTCCCTATGCCCTCTAGTTAGTCTGGCTAAGGGTTTAT   DDDDDIIHIIIIIIIIIIIHIIIIIIIIIIIIHIGIHIIHHIIIHHIIIHIII   AS:i:0
7001113:991:HVWNKBCX2:2:2211:8406:75832:21:06:91:12:TCATCTTTGT  16  chr1    3000464 1   52M *   0   0   TCTTTTTGTTTCCACTTGGTTGATTTCAGCTCTGAGTTTGATTATTTCCTGC    IHIHIIIIHHFF@EHIHGCHF<1IHFEEHEIHIIHIIHHHFIIGIHIDDDDD    AS:i:0
7001113:989:HTKVHBCX2:2:1211:2292:73091:90:80:26:12:CTGTACGGCT  0   chr1    3000559 42  53M *   0   0   CTTCTAGATTTGCTGTCAGGCTGCTAGTGTATACTCTAGTTTCCTTTTGGAGG   DDCDDIIIIIIIIHIIIIIIIIIIIIIHIIIIIIIIHIIIIIIIIIIIIHIII   AS:i:0
7001113:991:HVWNKBCX2:1:2205:17457:98814:90:80:26:12:CTGTACGGCT 0   chr1    3000633 30  53M *   0   0   CTCTTAGGACTGCCTCATTGTGCCCCATATGTTTGGCTATGTTGTGGATTTAT   DDDDDIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIII   AS:i:0
7001113:989:HTKVHBCX2:1:2216:4512:6199:90:80:26:12:CTGTACGGCT   0   chr1    3000747 32  52M *   0   0   ATTAAGTAGAGTATTGTTCAGTTTCCAGGTGAATGTTGGCTTTCTATTATTT    DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIH    AS:i:0
7001113:989:HTKVHBCX2:2:2209:10226:10613:58:87:87:11:TCGAATTTGT 0   chr1    3000919 32  51M *   0   0   ATTTGGTACTGAGAAGAAGGTATATATCCTTTTGTCTTATGATAAAATGTT DDDDDIIIIIIIHIIIIIHIIIHIIIIIIIIIIIIIIGIIIIIIIIIIIII AS:i:0

and this is the code I ran: sinto fragments -b merged.bam -f fragment

timoast commented 3 years ago

The default is to look for a read tag called "CB" containing the cell barcode information (see the docs). I don't see a CB tag in your bam file, it looks like it might be encoded in the read name. You can use the --barcode_regex argument to pass a regular expression to extract the cell barcode from the read name

rojinsafavi commented 3 years ago

Thanks Tim, Yes you are correct. the barcode is here: "7001113:991:HVWNKBCX2:1:2110:2683:18182:87:93:72:36:CAGGTATGGC"

May I ask how I should pass that to the --barcode_regex ?

rojinsafavi commented 3 years ago

I think my barcodes are a bit weird. I modified the sam file and made new barcodes and I think it is fine now. I will let you know if I have any further questions. Thank you.

AyushSemwal commented 2 years ago

I have the same issue. I want to generate a fragment file from a paired-end aligned BAM file. My read name has the barcode and so I used "[^:]*" regex. However, the output was an empty BED file. Below is the BAM file that I am using. Can you help me with this issue?

CTGAAGCTAGGCAGAATCTCTCCGGTACT:7001113:901:HGFWTBCX2:2:2207:15570:18734/2 147 chr1 10276 37 4M1D46M 0 0 GGGTTGGGGTTGGGGTTGGGGTTGGGGTTGGGGTTGGGGTTAGGGTAGGG DDDDDHIIIIHIIIIIIHIIIHHHIIIHDHIIHIIHII<FC1CGEH1CGH NM:i:1 MD:Z:4^A46 AGCGATAGCGTACTAGAAGGAGTAGGTTG:7001113:901:HGFWTBCX2:2:1114:16162:74492/1 99 chr1 10330 38 50M 0 0 GCCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTAACCCTAACC DDDDBH@FHHCEHH11<CFHIIE1FGH@DECHHE1<D@F11<DFHE11DH NM:i:1 MD:Z:C49 CTATTAGGTAGCGCTCCCTAGAGTGGCTC:7001113:901:HGFWTBCX2:2:1205:21004:29651/2 163 chr1 10434 45 50M 0 0 CNCTAACCCCGAACCCTAACCCTAACCCTAACCCTCGCGGTACCCTCAGC 0#<<<D1C1<CCHFHHGHHHIIIIIIIIGHIHHH@C?HDD?FEHHFHHH? NM:i:2 MD:Z:1C8T39 AGACACCTACTCGCTAGTAAGGAGGTACT:7001113:901:HGFWTBCX2:2:1106:1952:11878/1 83 chr1 10492 44 50M 0 0 GTGGTACTCTGAAGGCGGAGCACAGTTCTCCTCAGGTCAGACCCGGGCGDDDDDIIGIHIFFHIIGIIHGHIHHHHHHHIIHHHHHGIGIHHIHIIIIC NM:i:0 MD:Z:50 CTATTAGGTAGCGCTCCCTAGAGTGGCTC:7001113:901:HGFWTBCX2:2:1205:21004:29651/1 83 chr1 10550 45 50M 0 0 GCACAGACCCGGAGAGCACCGCGAGGGCGGAGCTGCGTTGTCCTCTGCAC DDBDBHHH@FHHIHHHHIFHIIIIIIGIIIDEHHHIHHHIIIIIIIHIIH NM:i:0 MD:Z:50 AGCGATAGCGATCAGTAAGGAGTACAGGA:7001113:901:HGFWTBCX2:2:1112:6627:33341/2 163 chr1 17465 42 50M 0 0 GTCCCAGGCCTCCCGAGCCGAGCCACCCGTCACCCCCTGGCTCCTGGCCDDDDDIIIIIIIIIIHHIIIIIIIIIIIIHIIIIIIIIIIIHIIIIIIII NM:i:0 MD:Z:50 AGACACCTCTCTCTACGCGTAAGACCTAT:7001113:901:HGFWTBCX2:2:1107:15505:39683/1 99 chr1 17471 42 50M 0 0 GGCCTCCCGAGCCGAGCCACCCGTCACCCCCTGGCTCCTGGCCTATGTGC DDDDDIIIIGIIIIIIIIIIIIIIIIIIIIIIHHIIGIIIIIIIIIIIII NM:i:0 MD:Z:50 AGACACCTCTCTCTACGCGTAAGACCTAT:7001113:901:HGFWTBCX2:2:2112:12312:15153/2 147 chr1 17623 42 43M1D7M 0 0 CTTGGGGTCTTCCCAGCAACATCAGCTCTGTCAGCTCCTTGCTGCTCTTC DDADCIIIIIIIIIIIHHGHIIIIIIIIIIIIIIIIHIIIIIIIIIIIII NM:i:1 MD:Z:43^C7 AGACACCTCTCTCTACGCGTAAGACCTAT:7001113:901:HGFWTBCX2:2:1107:15505:39683/2 147 chr1 17624 42 50M 0 0 CTTGGGGGTCTTCCCAGCAACATCAGCTCTGTCAGCTCCTTGCTGCTCTT DDDDDIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIII NM:i:0 MD:Z:50 AGCGATAGCGATCAGTAAGGAGTACAGGA:7001113:901:HGFWTBCX2:2:1112:6627:33341/1 83 chr1 17746 42 50M 0 0 ATTAGGCCCAGCACCAAATATTCACCATCCCTTGGCCATCCTGGCCCTCDDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIH NM:i:0 MD:Z:50

Tripfantasy commented 9 months ago

I think my barcodes are a bit weird. I modified the sam file and made new barcodes and I think it is fine now. I will let you know if I have any further questions. Thank you.

Stumbled across this thread. For future reference, playing around with python regex flavor came up with this to extract your barcode:

"[0-9]+:+[A-Z]*"

Explanation: Some digit followed by a colon followed by some string of capital letters.

There are several web-based regex testers that let you plug in your test string to see what regex works.