lh3 / minimap2

A versatile pairwise aligner for genomic and spliced nucleotide sequences
https://lh3.github.io/minimap2
Other
1.78k stars 407 forks source link

Weird self alignment #10

Open gt1 opened 7 years ago

gt1 commented 7 years ago

Hi,

I see the following somewhat weird data (SAM format) coming out of minimap2:

L0/46/0_12879   256 L0/46/0_12879   1   0   8963M116I28M116D203M110I9M1I15M1D22M110D3412M   *   0   0   *   *   tp:A:S  cm:i:15 s1:i:126    NM:i:454    ms:i:24372  AS:i:24744  nn:i:0
L0/46/0_12879   256 L0/46/0_12879   1   0   8963M116D28M116I203M110D9M1D15M1I22M110I3412M   *   0   0   *   *   tp:A:S  cm:i:15 s1:i:127    NM:i:454    ms:i:24372  AS:i:24744  nn:i:0

This is weird for two reasons:

  1. This describes an alignments of a read against itself running from end to end, but clearly not the optimal alignment between the two regions specified.
  2. There are two versions.

minimap2 was run with as

minimap2/minimap2 -ax ava-pb dup.fasta dup.fasta

where dup.fasta contains a single read (synthetic E. coli, though I think this should not matter).

The single read in dup.fasta is

>L0/46/0_12879
AGTCCTGCGGAAAGCGCCAGGGCGAGGATTGCTGTGGCAGGTTTACGTAATTGCATATCCAACTCCTTTATCTCTCTGCG
TTAAGAACGCACTGGAATACCCGTTGTGAGTGTTTTGTGTTGTTACGTCTGCAACTTTATTGTGCAGTGTGTGCCTGTTA
GGGAAGGTGCGAATAAGCTGGGGAAATTCTTCTCGGCTGACTCAGTCATTTCATTTCTTCATGTTTGAGCGATTTTTTCT
CCCGTAAATGCCTTGAATCAGCCTATTTAGACCGTTTCTTCGCCATTTAAGGCGTTATCCCCAGTTTTTAGTGAGATCTC
TCCCACTGACGTATCATTTGGTCCGCCCGAAGACAGGTTGGGCCAGCGTGAATAACATCGCCAGTTGGTTATCGTTTTTC
AGCAACCCCTTCGGTATCTGGCTTTCACGTAAGCCGAACTGTCGCTTGATGATGCGAAATGGGTGCTCCACCCCTGGCCC
GGATGCTGGGCTTTCATGTATTCGATGTTGATGGCCGTTTTGTTCTTGCGTGGAATGCTGTTTCAAGGTACTACCTTGCC
GGGGCCGCTCGGCGATCAGCCAGTCCAATATCCACCTCGGCCAGCTCCTCGCGCTGTGGCGCCCCTTGGTAGCCGGCATC
GGCTTAGACAAATTGCTCCTCTCCATGCAGCCAGATTACCCAGCTGATTGAAGGTCATGCTCGTTGGCCGCGAGTGGTGA
CCAGGCTGTGGGTCAGGCCACTCTTGGCATCGACACCAATGTGGGCCTTCATGCCAAAGTGCCACTGATTGCCTTTCTTG
GTCTGATGCATCTCCGGACTCGCGTTGCTGCTCTTTGTTCTTGGTCGAGCTGGGTGCCTCAATGATGGTGGCATCGACCA
AGGTGCCTTGAGTCATCATGACGCCTGCCTTCGGACCAGCCAGTCGATTGATGGTCTTGAACAATTGGCGGGCCAGTTGG
ATGCTGCTCCAGCAGGTGGCGGAAATTCATGATGGTGGTGCGGTCCGGCAAGGCGCTATCCAGGGATAACCGGGCAAACA
GACGCATGGAGGCGATTTCGTACAGAGCATCTTCCATCGCGCCATCGCTCAGGTTGTATCCAATGCTGCATGCAGTGAAT
GCGTAGCATGGTTTCCAGCGGAAAAGGTCGCCGGTCACATTACCAGCCTTGGGGTAAAACGGCTCGCTGACTTCCACCAT
GTGTTTTGCCATGGCAGAATCTGCTCCATGCGGGACAAGAAAATCTCTTTTCTGGTCTGAACGGCGCTTACTGCTGAATT
CACTGTCGGCGAAGGTAAGTTGATGACTCATGATGAACCCTGTTACTATGGCTCCAGATGACAAACATGATCTCATATCA
GGGACTTGTTCGCACCTTCCTTAGGTAACATTTAGTTTGGCTAAATGTAAAGATATTGCTGTTTTATTGTTTGTTTTTGC
GAGATGCGCCGCACCATTCCGAAGCAAAATTCTTAAAATGCACTCTTTTAGTGCTACCGCTGGATTACTGTGGTGCAACT
AGGTTGTACTGATGCTGTTTCAGGGTTGCCTTGTATAACAAAGCAATAGATCGTGCCAAAGTTGGATAGGAAATATGTTA
TCCGGATAATGCACTGATGCCGCATCCGGTGAGCGTGGCCGAAATATGGGATGTATTCCGGCACGATAAGAAGGGATTAT
TTACGTCGCTGACGGCAGACTCATCAACACAGCAGCAAAACCAAAACAATGCCGTCAGCACCCACAGTCGGACCAGTTGC
CGAGTACGTGCGTGATGGTGTGAGTTACCGGTGGTCGGCGTACGTTAGTGGTTAACACCTCGCGGGTGAACTGCGGGATC
ATCGCCTGAATTTCTCACCCTGCGGGCCAATCACCGCCGTAATGCCGTTGTTGGTGCTGCGCAACAGTGGGCAGCGCCAG
CTCCAGCGCACGCATTCGCGCCATCTGGAAGTGTTGCCATGGACCAATAGGTTTACCAAACCACGCATCCGTTGGAGATA
GTCAGCAGATAGTCGGTATCCGGGCGGAAGTTATCGCGCACTTGCTCGCCGAGAATGATCTCGTAGCAAATAGCCGCAGT
AAGCTCAATACCATTTGCCGACAGCGGCGGCTGGATATATGGCCCACGGCTGAACGACGACATCGGCAGATCAAAGAACG
GATGCTAACGGACGCAGAATCGACTCAGCGGGACAAACTCGCCAAACGGCACCAGATGGTTTTTGTTATAGCGATCGGCT
GAGTTCGTAGCTGTACGGGCGCACCTTTACCCAGCGTGATGATGGTCGTTGTAGGTATCGTAGCGGTTCTGCTTATTGAT
GACGCGCCTCGACAATCCCGGTTACCAGCGAGCTACCTTTATCACGCAACTCACCGTCCAGTTGCTTTGAGGAACGGTTG
CTGGTTAATTTCCAGATGCGGTTGATCGCCGACTCCGGCCAGATAATCAACGATGATTTGGCTCCATCAGCGGTGCCGTT
GACGTTGTAGTAAATCTTCAGCGTATTAAGAAGCTGGCCTTCGTCCCATTTCAGCGATTGCGGAATATCGCCCTGAACCA
TCGAAACCTGAATGGTTTTCTCCGGTTGTGGGGTAAACCCACTGGATGAACGTCAGACGGGAAGGGAGACGGCAAACATG
CACGACGGCCACCACCAGCTGGACGCCAGTTGCGTTTGACCAACGCCAGTGCCAGCAGGCCACTAACCATCATCAGCAGG
AAGTTAATGGCTTCCACGCCCATTATCGGTGCCAGCCCTTTTAACGGGACCATCAATCTGGCTATAGTCCGAACTTGTAA
CCACGGGTGAAGCCGGTCAAGTACCCAACCGCGCACGAAACTCGGTCACTTGCCAGAGGGCAGGGGGCGGCAATCGCTAC
GCGCAGCCAGGTGGTTTTCGGCCACAGACGCGACAGCACGCCAGCAAACAGTCCGGTATACAGCGACAAATACGCCGCCA
GCTGCACCACCAGGAAGATGTTAACCGGGCCAGACATTCCGCCAAAGGTCGCGATGCTGACATAGACCCAGTTAATACCG
CTGCCAAAGAGGCCAAATCCCCAGCAAAAGGCCAATAGCGGCAGACTGGAGTGGACGGCGGTTAAAGGTCAACGCCTGGC
AAGCCCCATCAGCGAAATAATCCGCCGCAGGCCAGACGTTCGTAAGGAGAGAAGGCCCATGCGCTTCCGCAGGCACCGAA
TAATAACGCCAGCAGCAGGCGAATGCGCTGGCGTTTCAATTACATGAGGCAAAAGCCATGTAAGTATATCTATCCAGTTT
CGGTTTATTCATCCAGCTTCGGCTGGGGTGAGTATCCGGGATTTTGACATGAACCTGAATAATACGCCGACTGTCGGCCG
TCGCCACTTTGAACTGGATAACCGTCGATGTCGATAGTTTCGCCACGCGCCGGAAAGATCCCAAATGCCTGCATCACCAG
ACCACCGATAGTCGTCGACTTCTTCATCGCTAAAGTGGGTGCCGAACGCTTCGTTGAAGTCTTCAATGGAAGCCCAGTGC
GCGTACGGTCCAGGTATGACGACTCAGCTGACGGAAGTCGATATCATCTTCTTCGTCATACTCGTCTTCTAATACTCACG
CAACAATCAGTTCCAGGATGTCTTCAATGGTCACCAGACCGGAAACCCCATCGAATTCGTCAATAACGTATCGCCATGTG
GTAACAGCTGAGAGCGAAACTCTTTCAGCATCCGGCTACGCGCTTACTTTCAGGAAGCGACAACCGCCTGACGTAACACT
TTGTCCATGCTGAAGGCTTCAGCATCGCTGCGCATAAACGGCAGCAAGTTCGTTTCGCCATCAGAATCCCTATCAATGTG
ATCTTTGTCTTCGCTAATCATCCGGGAAGACGTAGAGTGGGCGGACTCGTATAGATGACATCAAGACATTCGGTCCACGC
GTCTGGTTGCGTTTCAGGGTAATCATGCCTGGGAGCGGGGGATCATGATGTCGCGAGACGCGTTGGTCTGCGATGTCCAT
CACCCCTCTCGAGCATATCGCGCGTATCTTCGTCGATATAGGTCGTTCTGCCCGGAATCACGGATCAGCGCCAGCGTTCG
TCACGGTTTTTCGGTTCCACCGTTGGATAAAGTTGGCTGCAGTAACAGGGAGAAAAATCCCCTTTCTTGTTGCTTATCGT
GTCACTACTGTGTGAATTGTCGTCGCTCATGGCGTGTATGGGTTCTCATGTTAGTTAATCAAAACGCCGTCGTTAATCAC
CAACGGCGGGGACGTCTGCCAGTCAAATGCCTGGCAATGTATTCTTTCTCGGCAATGTACGGATCCTCATAGCCCAGAGC
AAGCATAATCTCTGTTTCGAGGGCTTCCATTTCTGTCTGCTTCGTCATCTTCGATGTGATCGCTAACCTAACAAATGCAG
ACTGCCGTGCACCACCATATGCGCCCAGTCGCGCCTCCAGTGGTTTGCCTTGCGTCCTGAGCTTCCTTCTCAACCACTGT
ACGGCAGATAACCAGATCGCCCAGTAGCGACATATTCCAGTGCCAGGCGGCACTTCAAACGGGAAGGAGAGCACGTTGGT
CGGCTTATCCTTACCGCGATAGGTCAGATTCAGACTGTGGCTTTCGGCGGTATCGACCACGCTGAATCGTCACTTCCGAT
TCTCCTGAAACTGCGGGATCACCGCATTCAGCCATGTCTGAAACTGGCTCTCTTCGCGGTAACCCGGAATTATCTTCACA
TGCCAGTGCTAAATCGAGGATCACCTGACTCATTTTTGTTCCTCTGTTCTTCGCGCTTGCTTCTGCTGCCAGCGCCGCTT
TTCGTTTTTGTCTCGGCTTCTTCCCATGGCTTCATAGGCGTTAACGATACTGCGCCACCACAGGGTGACGAACCACGTCT
TCGCTGTGGAAGAAGTTAAAGACTGATCTCTTCGAACATCGGCCAGCACTTCGATGGCGTGACGTAAGCCTGATTTAGTA
TTACGCGGCCAGGTCCGATCTGTGTGACGTCGCCGGTGGATAACCGCTTTTGAGTTAAAACCGATACGGGTCAGGAACAT
CTTCATCTGTTCGATGGTGGTGTTGCTGGCTCTCATCGAGAATGATTAAACGCGTCGTTCAGCGTACGACCACGCATATA
GGCCATGCGGTGCGACTTCAACTAACGTTGCCGCTCAATCAGTTTCTCGACTTTCTCAAAGCCCAGCATTTCAAACAGCG
CGTCGTACAGCGGGCGCAGATACGGGTCTACTTTCTGGCTTAAATCGCCAGGCGAGGAAGCCCAGTTTTCACCGGCTTCT
ACTGCCGGACGAGTCAGCAGAATACGGCGAATTTCCTCGACGCTCCAGGGCATCAACTGCCGCAGCCACTGCCAGGTAGG
TTTTACCCGTACCCGCCGGGCCAACGCCGAAGGTAATGTCATGGGTCGAGAATATTGGCGATGTACTGCGCCTAGGTTTG
CGTGCGCGGCTTAAAATTACGCCGCGTTTGGTTTTGATATTGACCGCTTTGCCGTACTCCGGCACGCTCTCCGCGCTGTC
TGCTCCAGGACCACGCGCTTCTGTAATTCGACAACGGTAGGCAATCTCGTTCCGGTTCGATATCCTGAATCTGACCCGCG
CATCGGGGTCAGTGATCGACATACAGGCTACGCCAGAATGTCTGACCGCAGCGGGACGCAAATCGGACGGCCGTGTCAGT
TTAAAGTGGTTATCGCGGCGATTGATCTCGATGCCGAGACGGCGTTCGAGCTGCTTGATGTTGTCATCAAACGGGCCGCA
CAGGCTCAACAGACGCGCATTGTCTGCTGGCTCCAGGGTTGATTTCGCGAGTGTCTATGTTCAAACCGTCCTCTTATCTG
TATGCCGCCGGAAGCTGAACATTCACCGGCCTATAAGGAAATTATTCACGCCACAGGAAAAAGGCGCAAGCGATTGCAAT
ATAAGATGGGGATAAAGAGAGAAAAAACAAGGCCCGACCGGAACGGCAGGCCTGAGAATTACGGCTGATAATAACCCACG
CCAAAGGTCGTTTTCTTTGACGGGTACGGGCAATCACTGATTCCCGGTGTTTCTGCCACGCGCCAGACCCATTTCATCTT
CAGTACGCACCACTTTACCGCGCAGAGAGTTCGGGTAGACGTCGGTAATTTCTACATCGACGAATTTACCGATCATATCC
GGCGTGCCTTCGAAGTTGACCACGCGGTTATTTTCCGTACGCCCGGAAAGCTCGCATGATGCTCTTCACGCGATGTACCT
TCCTACCAGATATACGCGGGTGGTGCCGAGCATCCGGCGGCTCCACGCCATCGCTTGCTGATTAATTGCGCTCTTGCATG
AATATACAGACGCTGCTTCTTCTCTTCTTCCGGAACATCATCAACCATATCGGCGGCCTGGATGTACCCGGACGTGCAGA
TAAGATAAAGTGTAGCTCATGTCGAAATTGACGTCGGCAATCAGCTTCATCGTTTTTCTCGAAGTCTTCGGTGGTTTCGC
CATGGGAAGCCAACGTATGAAATCAGAACTGATCTGAATATCTGGACGCGCCGCACGCAGTTTACGGATGATCGCTTTGT
ACTCCAGCGCCGTAATGGGTACGGCCCATCAGGTTCAGAATGCAGATCGGAACCGCTCTGTACCGGCAGATGCAGGAAGC
TCACCAGCTCCGGCGTGTCGCGATACACTTCGATGATATCGCTCGGTGAATTCGATACGGATGGCTCGGTGGTAAAGCGA
ATACGATCGATCCCGTCGATCGCAGCAACCAGACGCAGCAGATCGGCAAACGATCCGGTGGTGCCGGTCGTAGTTTTCAC
CACGCCAGGCGTTCACGTTCTGACTCGTAGCAGGTTGACTTCACGCACGCCCTGAGCCGCAAGCTGGTGCATATCTCAAA
CAGAATATCGTCGGACGGACGGCTGACCTCTTCACCACGGGTGTGAAGGCACCACGCAGTAGGTGCAATATTTATTGCAG
CCTTCCATGATGGAGACAAACGCGGTCGGCCCTTCGGCGCGCGGTTCCGGTAGGACGGTCAAACTATCTCGATTTCCGGG
AAGCTGATATCTACAACCGGGCTGCGGTCGCCACGCACGGAGTTGATCATCTGCCGGCAGACGGTGCAGCGTGTTGCGGC
CCAAAAATAATATCGACATCAGTGGGCGCGCTGGCGAATGTGCTCGCCTTCTTGCGATGCCACGCAGCCACCGACGCCGA
TAATCAGGTCTGGATTCTTCTCTTTTAACAGTTTCCAGCGACCTCAACTGATGGAAGACTTTTTCCTGAGCCTTCTCGCG
GGTTGAGCAGGTGATTCAGCAGCAGCACATCCGCTTTCTGTCCGCCTACGTCGGTCAGTTGATAGCCGTGGGTGGCATCC
ACAGATCGGCCATCTTCGAATGAATCGTACTCGTTCATCTGACAGCCCCAGGTTTTAATATGGAGTTTTTTGGGTCATCG
ACTTGCTCTTGCGAAATAGTAGCCAGGAATGCAGGGCGTCATAGTGTAATGCTTTGCTGACCGTTGTGACCAGTATGAGC
GTTATCAGCCCTTAGGGGTAAAAATCCTGTAAACTTAAAGCAGTATTGCTAACAGGATGATTGACCATGACAAATCAACC
AACGGAAATTGCCATTGTCGGCGGAGGAATGGTCGGCGGCGCACTGGCGCTGGGGCTGGCACAGCACGGATTTGCGGTAA
CGGTGATCGGAGCACGCAGAACCAGCGCCGTTTGTCGCTGATAGCCAACGGACGTCGGATCTCGGCGATCAGCGCGGCTT
CGGTATACATTGCTTAAAGGGTTAGGGTCTGGGATGCAGTACAGGCTATGCGTTGCCATCCTTACCGCAGACTGGAAACG
TGGGGAGTGGGAAACGGCGCATGTGGTGTTTGACGCCGCTTGAACTTAAGCTACCGCTGCTTGGCTATATGGTGGAAAAC
ACTGTCCTGCAACAGGCGTTGTGGCAGGCGCTGGAAGCCGCATCCGAAAGTAACGTTATCGTCGTGCCAGGCTCGCTGAT
TGCGCTGCATCGCCATGATGATCTTCAGGAGCTGGAGACTGAAAGGCGGGAAGTGATTTCGCGCGAAGCTGGTGATTGGT
ACCGACGGCGCAAATTCGCAGGTGCGGCAGATGGCGGGAATTGGCGTTCATGACATGGCAGTATGGCGCAGTCGTGCATG
GTTGATTAGCGTCCAGTGCGAGAACGATCCCGGCGACAGCACCTGGCAGCAATTTACTCCGGACGGGACCGCGTGCGTTT
CTGCCGTTGTTTGATAACTGGCATACGCTTAGGGATTGGGTATGTGACTCTGCCCGGCGTCGTGTATGTCGCCAGTTGCA
GAATATGAGTATGCGCACAGCTCCAGAGCGGAAATCGCGAAGCATTTCCCGTCGCGTTCTGGGTTACGTTACACCGCTTG
TCCGCTGGTGCGTTTCCGCTGACGCGTCGCCATGCGTAGCAGTACAGTGCAGCAGGGCTTGCGCTGGTGGGCGATGCCGC
GCATACCATCCATCCGCTGGCGGGGCAGGGAGTGAATCTTGGTTATTCGTGATGTCGATGCCCTGACTTGATGTTCTGGT
CAACGCCCGCAGCTACGGCGAAGCGTGGGCCAGTTATCACTGTCCTGCAAGCGGTACCAGATGCGGCGCATGGCGGATAA
CTTCATTATGCAAAGCGGTATGGATCTGTTTATGCACGGATTCAGCAATAATCTGCCACCACTGCGTTTTATGCGTAATC
TACGGGTTAATGGCGGCGGAGCGTGCTGGCGTGTTGAAACGTCAGGCGCTGAAATATGCGTTAAGGGTTGTAGCCTTACA
ACATTGCCGGGATGACGTGCCTAACCGTAGGTCGGATAAGACGCGGCAGCGTCGCATCCGACATTGAAGGATAAGACGTG
TCAACGATCGCATTCGACATTGAATGAACGCAGAAAAGCAAAAAGGCTCGCCAGAAGCGAGCTTTTTTAATGTGGCTGGG
GTACGAGGATTCGAACCTCGGAATGCCGGAATCAGAATCCCGTGCCTTACCGCTTGGCGATACCCCAACTGGGTGCACTT
AACTAAGGTAAGCGTCTTGACATAAATTGGCTGGGGTAGCGAGGATTCGAACCTCGGAATGCCGGAATCAGAATCCGGTG
CCTTACACGCTTGGCGATACCCCAACAAATTGGTTTTGAATTTGCCGAACATATTCGATACATTCAGAATTTGGTGGCTA
CGACGGGATTCGAACCTGTGACCCCATCATTATGAGTGATGTGCATCTAACCAACTGACGCTATCGTAGCCAGATTGTTT
CTTCGATGGCTGGGGTACCTGGATTCGAACCAGGGAATGCCGGTATCAAAAACCGGGTGCCTTACCGCTTGGCGATACCC
CAATAACCGGGTCGGTGAACCGCTTACTCGAAGAAGATGGCTGGGGTACCTGGATTCGAACAGGGAATGCCCGGTATCAA
AAACCGGTGCCTTACCGCTTGGCGATACCCCATCCGGTACAACGCTTTCGTGGTGAATGGTGCGGAGAGGCGAGACTTGG
AACTCGCACACCTTGCGGGCGCCAGAACCTAAATCTGGTGCGTCTACTCAATTTCGCCACTCCCGCAAAAAAAAGATGTG
TGGCTACGACGGGATTCGAACTGTGCACGCCCACCATTATGAGTGATGTGCTCTAACCAACTGAGCTACGTAGCCATCTT
TTTTTTCGCGATACCTTATCGGCGTTGCGGGGGCGCGATTATGCGTCGTAGAGCCTTAGCAGTCGTCAACCGTCTTTTTC
AAGGAAAATTGCTCGAAAGTGACTGTTTGGTTAGGTTGGAACAGCGTGGCGCTATATTCGTCAATTATTGTTTACTTTGT
GTTTGTTTCCAACCCTACAGCCCATTCTTTTGTCATACAGGATGAAATTCGGAATTTAACAATAGTGGTGGTGAAATTAA
TCTATGAAATACTGGCCTACAGTGGATGAGTTGTCAAACAGTGATGTGGCAAACCCGGAACATTTCCTTACTGCATATCC
AGAATCAACAAGCTACCTCAATAACTGTAAACAGCCCCGGATTTCACCGGGGCTGTTTCGCATTTCTTACTTATACGCCG
ACTGAGTGAACCACCAACCGCGCGACCAGACGGATCGTCCATTTTCTTGAACGCTTTCATCCCATTCGACTCGCTTTAGC
GGTAAGAACAAGCGACGGAAGCGGACGCCCGGCACGCACTCAGCGGCGCTCGGAAGCGGGAATAGTCTTCAAAGATCTCC
CGATACAAGTACGCTTCTTTAGAGGTTGGCGGTGTTGTACGGGAAAGCGGAAGCGCGGCAGTTTCGCAGTTGCTGATCAG
AAACCTTGCTGCGCAGCCACTTCTTTCAGGGTGTCGATCCATACTGTAACCAGACGCCATCGGGAGAACTGCTCTTTCTG
CCGCCAGGCCAGCTTGCAGGCAGATACGCTTCAAAACATTCACGCAGGATGTGTTTTTCCATTTTGCCGTTACCGCACAT
TTTATCCTGTGGGTTAATACGCATCAGCCACATCAAGGAATTTGTTTGTCGAGGAACGGAACGCGTGCTTCCACGCACCG
CAGGCTGACTCGCTTTGTTGGCACGCGCGCAGTCATACATATGCAAGGGCCAGCAGTTTACGCACCGTCCTCCTCATGCA
GTTCTTTGGCATTCGGGGCTTTGTGGAAGTAAAGATAACCGCCGAACACTTTCATCAGCAACCTTCACCGGACAGCACCA
TTTTAATGCCCATCGCCTTTGATCTTACGCGACATTAAATACATCGGTGTTGAAGCGCGAATAGTGGTCACATCATAAGG
TTTCGATAGTGGTAAATCACGTACGCGGATGGCATCCAGACCTTCCTGTACAGTGAAGTGAATTTCGTGATGCACCGTGC
CCAGATGGTTTGCCACTTCCTGGGCTGCTTTCAGATCCGGTGAACCCGGCAGACCTACCAGCAAAGGTAGTGTAACTGCC
GGCCACCAGGGCTTCAGAGCGTTCCTGATCTTGCCACGCGACGGGCGCGTATTTCTTGGTGATAGCGGAAATAATTGAGG
AATCCAGACCACCAGAAAGCAGCACACGCGTAAGGCACACAGACATCAGATGGCCTTTTAACTGAATCTTCCAGTGCCGA
AGCACTCGTTTTTGTCAGGTCACGTTATCTTTCACCGCTATCGTAGTCGAACCAAGTCGGCGATGATAGTAAGAACGGAT
TTCGCCGTCCTGCGCTCCACAAATAGCTCCCCGCCGGGAACTCTTTAATCGTGCGGCAAACTGGCACCAGCGCTTTCATT
TCTGAGGCCACATACAGCTGACCGTGTTCGTCATACCCCATACACAGTGGGATGATCCCCAGATGCGTCGCGACCAATCA
GGTAGGCATCTTTTTCGCTCGTCGTACAGTGCAAAGGCAAACATGCCCTGCAAGTCGTCGAGAAATTCCGGCCCTTTCTT
CCTGATACAGCGCGAGGATCACTTCACAGTCAGACCCGGTCTGGAACTGGTAACGAATCGCCATATTCGGCGCGCGAATG
CCTGGTGGTTGTAGATTTCACCGTTTACCTGCCAGTACGTGGGTTTTTTGTTAGGTTGTATGAGAGGTTGCGCCCCCGCG
TTAACGTCAACAATTGACAACCGTTCGTGGGCGAGAATGGCGTTATCGCTGGCATAAATACCGTGACCAGTCCGGGCCAC
GATGACGCATGCAGGCGTGACAGCTCGAGGGGCTTTCTTACGCAGCTAAGTGCGTCTGTTTTGATATCAGAATAGCGCCA
AAAATTGAACACATAACCTTCTCCGTTAACCTGGTATTTGTTGCTTGTTGTGTTTGCTTGTTTAAAAAAATGCCGCAAAG
CAGCACTGTGCGCAGTCCGATTTGGATGGGTGAAAAAATAAAGAAAAAGTAATTGGATAGACTCTTGTGGATTTGGTGCA
TAAAAAGGTCTGGTGTGAGGATATATTTATTGATTGAATCGATAATTTTTAGCGGGTTTTATTGAATGTTATATTTTACT
TGGGGGCCAAATTTGCTGACAAAGTGCGAGTTTGTTCATGCCGGAATGCGGCGTGAACGCCTTATCCGGCCACAAAAGGC
ATGAAAATTCAATATATTAGCAGGAGCTGCGTAGGCCGTGATAAGCGAGCGCCATCAGGCAGTTTGGCGTTTAGTCATCA
GAGCCAACCACGTCCGCAGACGTGGTTGCTATTCGAAACGTCGATTTCAGCGACTGACCGGGTAAATCCAGCTGGGGCGA
AAAGGCATACCTGTCGATATCGTCGAGCGACGAAACACCAGAATGCACCAGAATCGTCTCCAGACCTGCCTGGAAGCCGG
CCAGAATACGGTACGCAGGTTATCGCCGACAATCACCGTTTCTATCCGAATGCGCCTGCATTATGGTTTAATGCTGCGCG
GATGATCCACGGGCTGGGCTTACAAACATAAGAACGGTTCTGCGCCCGGAAGATTTTCTCAATCCCTGCACAACAACGCG
CGCACAAGCGGGATAAAAACCGCGCGCCGTGGGTGATCCGGATTGGTGGCGATAAAACGTGCACCGTTAGCGACGAAATA
GGCTGCTTTATGCAATCATGTCCCAGTTGTAGGAACCGCGTTTCGCCCAACAATCACGAAAATCAAGGGTTCACATCGGT
AATAGTGAAACCGAGCTTTGTACAGTTCATGAATCAGTGCGCCTTCGCCCACCACATACGCTTTTCTTGCCTTCCTGGCG
ACTGGAGGAATCGTGCAGTCGCCATCGGCAGAGGTATAAAACACGCTGTGCAGGTACATCGACACCTGCGGTGGCAAAGC
GGTTCGCCACGATCTTGCCCAGTCTGCGAAGGATAGTTGGTCTAGCGAACAGCAGCGGCAGGCCTTTATCCATAATCCC

Best, German

lh3 commented 7 years ago

It is not recommended to generate CIGAR for read overlapping because 1) generating CIGAR for every overlap is very slow; 2) it is usually not necessary to have cigar for read overlapping. SAM is also the wrong format for read overlapping. No read overlappers output SAM.

On your example, minimap/minimap2 ignore anchors with the same position if the read name is the same, so you shouldn't see perfect alignment. However, with CIGAR on, alignment extension may still produce a nearly perfect alignment from different seeds. This is a problem with minimap2, which I will try to fix at some point. Thanks.