schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
322 stars 35 forks source link

CIGAR string starting with non-matching base #260

Open baozg opened 2 months ago

baozg commented 2 months ago

Hi, @mnshgl0110

I found SyRI couldn't go through the CIGAR without starting S or H. I used AnchorWave to align A.thaliana genomes, so the alignment could be more intact compared to minimap2. Another reason also could be our genome now have complete telomeres (TTTAGGG), so usually, it could be aligned somehow. So do you think SyRI should call this alignment?

Chr1    0       Chr1    1       60      736I3633=103D1X21=1X23=2X10=1D2=1X2=1X3=1D2=1X7=1X2=2X16=1X18=1X13=1X3=1X3=167I1=1X22=1X46=1739D2=1427I798=4D238=1X135=1X1206=20D99=8I220=1X1040=1I370>
Chr2    0       Chr2    1       60      2225D1X24=1X23=1X6=1I29=1X45=1D4=1X28=1X61=1D80=2I137=1D68=1I66=3I24=2I1X93=1I7=1X74=1X70=1X164=1I5=1X17=1X28=1X132=1X176=1X30=1D1X17=2D8=1X131=1I19=3>
Chr3    0       Chr3    1       60      434I2902=1X6=1X6=1X6=1X4=1X6=1X6=1X27=1X4=1D2=1X8=1D10=1D2=1X4=2785I3=1X2=1X3=1X6=1X58=1X15=9I2=1X38=8I54=1X14=44I6=1X1=1X1=1X2=1X3=1X4=22I11=1X3=1X30>
Chr3    16      Chr3    6068133 60      21048042H509=1X1075=4D166=1X93=20I264=2X545=2D35=1X173=1X324=1X264=1X116=1X87=1X15=1I261=1I60=1X417=1X282=19D198=1X174=1X350=1X44=1X562=1X1054=1X33=1X>
Chr3    0       Chr3    6079915 60      6071903H1849=1X49=1X352=1X800=1D887=1X483=1X69=1X301=1X546=1X1105=1X47=1X568=1X164=5I33=1X13=1X56=1X245=1X468=1X247=1X2138=1X297=1X32=1X699=1X9=1X136=>
Chr4    0       Chr4    1       60      42D5=4538I137=1X3=1X10=2X113=1X345=1X926=1I506=1X443=1X337=1X6=1X169=1D34=1X5=1I49=1X66=1I182=1I13=1X30=1I25=1I316=1X521=1X156=1I17=1I28=1I48=1I96=1I2>
Chr4    16      Chr4    5497633 60      20531110H131=1X92=1X8=1X11=1X29=1X44=1X8=1X17=1X75=1X48=1X199=1X29=1I7=1X20=1X94=1X15=1X69=2X1=1X80=1X22=1X42=1X15=1X7=1X19=1X40=1X79=1X102=1X84=1X144>
Chr4    0       Chr4    6660232 60      3252768H352=1X181=1X128=1X896=2X4=1X32=1X32=1X26=1X14=1X19=1X1=1X1=1X6=1X19=1X1=2X1=1D129=1X26=1X4=1X9=1I2=1X15=1X2=1X1=1X3=1X3=2X5=1X7=2I162=1X14=1X1>
Chr5    0       Chr5    1       60      1X44=2I1X16=1D31=1X27=1D22=3I26=791I10=2I63=1D24=1I27=1D12=3I11=3I14=1I12=1X5=1D10=1D17=1X26=3I4=1D10=1X5=1D14=1X33=1D21=1X20=1X6=1I14=1I7=1X1=1I1X18=>
Chr5    16      Chr5    15656839        60      13863514H186=1X15=1X132=1X14=1X32=1X23=2X25=1D1X10=1D2X28=1X4=1X254=1X156=1X19=1X98=1X11=1X14=1X3=1X4=1X1=4D1X4=1X1=9D4=1X1=3X9=4I2=2D6=1X1=22>
Chr5    0       Chr5    16243355        60      18494892H65=1X81=1X11=1X11=1X10=1X16=1X34=1X110=2I35=1X16=1X47=1X16=2X20=1I9=1D4=1X3=1X6=1X6=15I5=1X3=3X33=1X65=1X13=1X3=1D10=1X37=1X36=1X110=>
mnshgl0110 commented 1 month ago

The CIGAR strings implies that the alignments have large indels at the ends (736I for the first alignment, 2225D for the second alignment). This is quite unexpected. Please recheck that the alignments are correct.