tseemann / samclip

Filter SAM file for soft and hard clipped alignments
GNU General Public License v3.0
46 stars 10 forks source link

clipping incomplete #9

Closed thomasvangurp closed 4 years ago

thomasvangurp commented 5 years ago

This is a very useful tool. Unfortunately, it does not perform as I would expect:

[samclip] CHROM=C564730593|kraken:taxid|9541|NM_001287300.1|Macaca:1-1263 POS=413..1607 CIGAR=339S30M2I4M1I6M1I22M1I3M2D13M1D6M2D3M4D35M1I30M698S HL=0 SL=339 SR=698 HR=0 max=10)
[samclip] ^^^ KEPT
ed828835-5dc5-47ad-864f-ed1032142b93    16  C564730593|kraken:taxid|9541|NM_001287300.1|Macaca  413 0   339S30M2I4M1I6M1I22M1I3M2D13M1D6M2D3M4D35M1I30M698S CCACTTCAAATCTCCTCTCCCACCGCACCTGTTCCCGCGTGTAGGAACATACCTCTACCATTTCAACTGGAGCCCCACTGTGGTGATTTTCCTTTGGTAGGGTTCCTGGATGGCGAACCGTACGGACCCAACGGCGAGGAACATAAGGGGAACAAATCCACAAAACCTGGTGGGAGAAGATTTTGGAGGACGAAGATTTACCAAGCCACCGTGCTACTGGAAGGAAGCAATGTTTGGCCTGACCGGCTTGGAGACGCTTGTCAGACAAGGCCATGGAGCTAGATCACCTCGATAGCACCTCACGGCAACCGGAAGCCCACCCGTTCATGTGCCGTCACTAAGATGCTTCAGATTCAACCCGATAAAGGTGCCTAAGTTGTTGAATTTTATCAAAAATGAAGATTTACAAATGTACGAATTCTTGAGTATTAAGACTTACAGGCACCGATATTGATGTTTACCGGTACCTGGGAGCCTTTGTACAATGACTACCGAAAGCTCAGGCGCATTAGCTGATGACGGGTTCGGACGCGACATATGGACAAAACGAGGTTACACGAACTTCAATGTAAAGCACCTATTCTTTGGTGATGTTTGTTTCTGCCAATTTTACAAAAAAGAGGCTTTACCATTGAAGCAATTGGTGCCTTGAGCCGAGAAAAAGCACCTGAAGATGATTTTTGAAGAGGAGGAAGAAAGCAAAGGGGACGACGATGATCAGATAATGGCCGGTTGATGATCGTAAGAAAAAGGATTTACCATCGTGGTGGCCTAAAAGCCCGGTAAGAGAAAGAGAGGGACAGGAAACGCGACGGCAGCCATCGATACAGGGATCGTGATTATGATAAAGACTAAGGATCGAGATTATGACAGGGGAACGTGGAAGAGGGCGCATGTGAATAGAGACAGGGAGTAGGTGTTGAGACATGTGGATAGGGATAGGGGACCGGATATCGATTGAGAGAAGACAAAGTGAATATGGACGGGAAAGGGACCGAGATATTGTGGAAGTGTAGAGATCGCAGGAGATAGGCGGGACAGGGACCGTGGTGAAGGCGTAGGTCCCACTCAAGGGAGCCCAAGTAGAAGTAGGAGTAGCTAGAGATCGCAAGGATCGTGATGTGGAAGAAACGTCAAAAAGAGCATGCTCGGGGCAGCATCAGCCCCAAGAAGAAGCAAAAAAAAAAAAAAAAAA "#'%%')797-.;95.-172%0'''#5/4+++'&),%)'&%$#,488=<966/(%&,*-3/.-+.(+$1477*0)$((%++..&)*&(*'.++++1$.648:23*+%36079869:2<%6891<BA811)'%&&%$&%%377=89023/450/7?<1;2967=<:978::4:8;@5/&&88<@C;5;9<>86:86;=.*+..454645+&#%$%)0,)690*.6+)#(541*(/05084'=<43%$#.-(+$26(&&)//-(#1571&--10:;88:;1/277&-4/*#$*$%&./7+(#&74&$*=9@+=?JK@1/9:%$&(,150''*)&%-/2&-)8=:B8;:83//:;..08::5*676/*$%$&"##)(((+/-2A89'+)''-599=?=<864%#2,-.30>93%$%-)-73)4'6601%#/(%&&)/*),7/=<<4<2'',-46888('&/46;2==:<6/#$$%(/0//=A<0<;==:<=&&1''25.-,0++(,,/#&5-0752+?>:>?8132333783(&,:=:<9/,'95#((()%/74416,$(+-/.,&&$*%$##&%#"$0)%$$%$)%&&)(+16'%$&&%&++(+$')+#''01/(,127;;64(-,),-31*699--,##+-.374211'*,6;746+4/555(%$'+$$*,+#-04833.#''%,992.:3+,50.1,)$#&%$%*+.1(#+.(243;676<><>;94822(')34=(0?8)('38:;330'+418/)*)#)$$$#&'%45+&%)*)&%)(%$##$$%,.4.&&+*804>91&(36@A?8;43$%%(2433;$%5>%&4:77:>+%%##+''%$<?>>AC<<0*-&*76/21++9))(&58(97<@AE?*-)348668378+#*0*(&-+$##$).*23148-&$2**'&&&%)$*#$#%$.71-0./6616898;6'/463',%$'-3<?;1@@A@<>5700''492.7/9;7.58861++46:227/0<-3',(&*+.&$$&&&&*'.#%"&&(443540'''-*'.&'(14/(1330012.((')$$%*18;<7=(,.5119<6.9:<?E@:96@?:29*86##)$$#-.,0<'$#$*',9:3.''-=--.24/&&6<<57:5&'''&&1#&611.''11220327:=>;;;::*02114221,'*;:;<I@ECGCIFTDJC@ NM:i:40 ms:i:88 AS:i:88 nn:i:0  tp:A:P  cm:i:4  s1:i:43 s2:i:43 dv:f:0.1136
[samclip] Total SAM records 34, removed 0, allowed 34, passed 34

The above read has CIGAR=339S30M2I4M1I6M1I22M1I3M2D13M1D6M2D3M4D35M1I30M698S which scores as HL=0 SL=339 SR=698 HR=0 max=10. So, why is it not filtering out this read? The soft clipping length is clearly over the max of 10. In comparable situations it does filter these reads out: [samclip] CHROM=C354792960|kraken:taxid|10090|JN957461.1|Mus:1-38030 POS=54..1269 CIGAR=4S5M1D14M1D17M4D2M1D9M2D7M1D3M1D9M2D2M1D3M1D21M1D6M1D4M1D9M1D15M1D8M1D2M1D17M1I7M4D10M1D15M2D33M1D3M2D10M2D8M1D5M1I17M1D11M1I10M1D11M1D2M1I13M1D7M1D3M1D9M881S HL=0 SL=4 SR=881 HR=0 max=10)

tseemann commented 5 years ago

Could it be that this read is at the end of a contig?

   # if either end is clipped more than --max allowed, then remove it
   # unless it is at a contig end