srobb1 / RelocaTE

Find the locations of TEs using the TSD in unassembled short reads by comparing to a closely related reference genome assembly
Other
10 stars 6 forks source link

Non-reference annotation problems #4

Closed nelson42 closed 9 years ago

nelson42 commented 9 years ago

When using your method on yeast data (sacCer2), using the UNK tag on the TE input sequence files, I find that the non-reference locations annotated don't exactly line up with what is show in the genome browser.

For example (these are taken from several different examples of the ....confident_nonref.txt file): TY3 TTAA not.given chrXII:687946..687949 - 62 124 GGTATAAAAAATGATTTCGCCCAGGATCGAACTGGGGACGTTCTGCGTGTTAAGCAGATGCCATAACCGACTAGACCACGAAACCTATTATTTATGTTTA ATCCACGTTACACCACCAATTATACAAAA ATAAAAAAGAATGACGGCAAAAAAGCAAAAAGTGACATTAATCATATTTGTTTATGTCTTATTCTACTGGT TY3 GTTTT not.given chrXVI:435972..435976 + 57 94 TGGAAAAAAAGAAAGCTCGCACTCAGGATCGAACTAAGGACCAACAGATTTGCAATCTGCTGCGCTACCACTGCGCCATACGAGCTTTTGTTGTATAGTT TTAAGGCAATAGAATAATAACACTTATCC ACGCAACAGTAATGTGGTTAAATAATAACTTCCTGTCAGGACTTGTGGTTGATTGGTGAAAATCAAATATC TY3 TTT not.given chrXII:687946..687948 + 3 6 AGGTATAAAAAATGATTTCGCCCAGGATCGAACTGGGGACGTTCTGCGTGTTAAGCAGATGCCATAACCGACTAGACCACGAAACCTATTATTTATGTTT AATCCACGTTACACCACCAATTATACAAA AATAAAAAAGAATGACGGCAAAAAAGCAAAAAGTGACATTAATCATATTTGTTTATGTCTTATTCTACTGG TY3 TT not.given chrVII:931042..931043 + 1 12 AAAAGAAAAAAATGCGCAAGCCCGGAATCGAACCGGGGGCCCAACGATGGCAACGTTGGATTTTACCACTAAACCACTTGCGCTTACTGATTATATCTTA ATGGTTGAAGAGATCTGAATAAGTTTTAT CTTCCTCTGGGCAAAGAATACTATACATTGCAAGAGTACCCATTAGAAGGGAATTAATTTATGCACTCTAA

For the first sequence the TSD is given as TTAA at chrXII:687946..687949 whereas the actual location of that sequence seems to be chrXII:687946..687950.

The second sequence the TSD is given as GTTTT at chrXVI:435972..435976 whereas the actual location of that sequence seems to be chrXVI:435973..435978.

For the third sequence the TSD is given as TTT at chrXII:687946..687948 whereas the actual location of that sequence seems to be chrXII:687945..687948.

For the forth sequence the TSD is given as TT at chrVII:931042..931043 whereas the actual location of that sequence seems to be chrVII:931040..931042.

Strangely these slight differences do not seem to be consistent with relation to the TSD length. The differences also seem to be independent of the orientation of an insertion.

srobb1 commented 9 years ago

Hi Nelson I will look into this.

Thanks for your comment, Sofia

srobb1 commented 9 years ago

Ok, Fixed it!! Checkout RelocaTE-1-0-5. Thanks for letting me know about this error.

Thanks Again, Sofia