When using your method on yeast data (sacCer2), using the UNK tag on the TE input sequence files, I find that the non-reference locations annotated don't exactly line up with what is show in the genome browser.
For example (these are taken from several different examples of the ....confident_nonref.txt file):
TY3 TTAA not.given chrXII:687946..687949 - 62 124 GGTATAAAAAATGATTTCGCCCAGGATCGAACTGGGGACGTTCTGCGTGTTAAGCAGATGCCATAACCGACTAGACCACGAAACCTATTATTTATGTTTA ATCCACGTTACACCACCAATTATACAAAA
ATAAAAAAGAATGACGGCAAAAAAGCAAAAAGTGACATTAATCATATTTGTTTATGTCTTATTCTACTGGT
TY3 GTTTT not.given chrXVI:435972..435976 + 57 94 TGGAAAAAAAGAAAGCTCGCACTCAGGATCGAACTAAGGACCAACAGATTTGCAATCTGCTGCGCTACCACTGCGCCATACGAGCTTTTGTTGTATAGTT TTAAGGCAATAGAATAATAACACTTATCC
ACGCAACAGTAATGTGGTTAAATAATAACTTCCTGTCAGGACTTGTGGTTGATTGGTGAAAATCAAATATC
TY3 TTT not.given chrXII:687946..687948 + 3 6 AGGTATAAAAAATGATTTCGCCCAGGATCGAACTGGGGACGTTCTGCGTGTTAAGCAGATGCCATAACCGACTAGACCACGAAACCTATTATTTATGTTT AATCCACGTTACACCACCAATTATACAAA
AATAAAAAAGAATGACGGCAAAAAAGCAAAAAGTGACATTAATCATATTTGTTTATGTCTTATTCTACTGG
TY3 TT not.given chrVII:931042..931043 + 1 12 AAAAGAAAAAAATGCGCAAGCCCGGAATCGAACCGGGGGCCCAACGATGGCAACGTTGGATTTTACCACTAAACCACTTGCGCTTACTGATTATATCTTA ATGGTTGAAGAGATCTGAATAAGTTTTAT
CTTCCTCTGGGCAAAGAATACTATACATTGCAAGAGTACCCATTAGAAGGGAATTAATTTATGCACTCTAA
For the first sequence the TSD is given as TTAA at chrXII:687946..687949 whereas the actual location of that sequence seems to be chrXII:687946..687950.
The second sequence the TSD is given as GTTTT at chrXVI:435972..435976 whereas the actual location of that sequence seems to be chrXVI:435973..435978.
For the third sequence the TSD is given as TTT at chrXII:687946..687948 whereas the actual location of that sequence seems to be chrXII:687945..687948.
For the forth sequence the TSD is given as TT at chrVII:931042..931043 whereas the actual location of that sequence seems to be chrVII:931040..931042.
Strangely these slight differences do not seem to be consistent with relation to the TSD length. The differences also seem to be independent of the orientation of an insertion.
When using your method on yeast data (sacCer2), using the UNK tag on the TE input sequence files, I find that the non-reference locations annotated don't exactly line up with what is show in the genome browser.
For example (these are taken from several different examples of the ....confident_nonref.txt file): TY3 TTAA not.given chrXII:687946..687949 - 62 124 GGTATAAAAAATGATTTCGCCCAGGATCGAACTGGGGACGTTCTGCGTGTTAAGCAGATGCCATAACCGACTAGACCACGAAACCTATTATTTATGTTTA ATCCACGTTACACCACCAATTATACAAAA ATAAAAAAGAATGACGGCAAAAAAGCAAAAAGTGACATTAATCATATTTGTTTATGTCTTATTCTACTGGT TY3 GTTTT not.given chrXVI:435972..435976 + 57 94 TGGAAAAAAAGAAAGCTCGCACTCAGGATCGAACTAAGGACCAACAGATTTGCAATCTGCTGCGCTACCACTGCGCCATACGAGCTTTTGTTGTATAGTT TTAAGGCAATAGAATAATAACACTTATCC ACGCAACAGTAATGTGGTTAAATAATAACTTCCTGTCAGGACTTGTGGTTGATTGGTGAAAATCAAATATC TY3 TTT not.given chrXII:687946..687948 + 3 6 AGGTATAAAAAATGATTTCGCCCAGGATCGAACTGGGGACGTTCTGCGTGTTAAGCAGATGCCATAACCGACTAGACCACGAAACCTATTATTTATGTTT AATCCACGTTACACCACCAATTATACAAA AATAAAAAAGAATGACGGCAAAAAAGCAAAAAGTGACATTAATCATATTTGTTTATGTCTTATTCTACTGG TY3 TT not.given chrVII:931042..931043 + 1 12 AAAAGAAAAAAATGCGCAAGCCCGGAATCGAACCGGGGGCCCAACGATGGCAACGTTGGATTTTACCACTAAACCACTTGCGCTTACTGATTATATCTTA ATGGTTGAAGAGATCTGAATAAGTTTTAT CTTCCTCTGGGCAAAGAATACTATACATTGCAAGAGTACCCATTAGAAGGGAATTAATTTATGCACTCTAA
For the first sequence the TSD is given as TTAA at chrXII:687946..687949 whereas the actual location of that sequence seems to be chrXII:687946..687950.
The second sequence the TSD is given as GTTTT at chrXVI:435972..435976 whereas the actual location of that sequence seems to be chrXVI:435973..435978.
For the third sequence the TSD is given as TTT at chrXII:687946..687948 whereas the actual location of that sequence seems to be chrXII:687945..687948.
For the forth sequence the TSD is given as TT at chrVII:931042..931043 whereas the actual location of that sequence seems to be chrVII:931040..931042.
Strangely these slight differences do not seem to be consistent with relation to the TSD length. The differences also seem to be independent of the orientation of an insertion.