zhangrengang / TEsorter

TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes
https://doi.org/10.1093/hr/uhac017
GNU General Public License v3.0
87 stars 19 forks source link

Does TEsorter results only contain positive strain? #37

Closed LeeYEAH2 closed 2 years ago

LeeYEAH2 commented 2 years ago

Hi there! @zhangrengang I checked my TEsorter results and found the .dom.gff3 results file only contain LTRs from positive strain. My import file is from LTR_retriever with .LTRlib.fa, and I am very sure that it contains negative strain LTRs. And it is certain that there are some high score LTRs from negative stran which should be identified. Did I miss some arguments which cause this problem? Yours sincerely.

zhangrengang commented 2 years ago

Hi, there are some lines in negative strand:

Os0494#DNAauto/CACTA    TEsorter        CDS     7694    8968    630.4   -       1       ID=Os0494#DNAauto/CACTA|Class_II/Subclass_1/TIR/EnSpm_CACTA:CACTA-TPase;Name=EnSpm_CACTA-TPase;gene=TPase;clade=EnSpm_CACTA;evalue=1.9e-191;coverage=98.8;probability=0.98
Os0503_INT#LTR/Gypsy    TEsorter        CDS     578     946     137.2   +       1       ID=Os0503_INT#LTR/Gypsy|Class_I/LTR/Ty3_gypsy/non-chromovirus/OTA/Tat/Ogre:Ty3-GAG;Name=Ogre-GAG;gene=GAG;clade=Ogre;evalue=3.2e-42;coverage=100.0;probability=0.98
Os0554#DNAauto/MULE     TEsorter        CDS     5689    6225    139.7   +       0       ID=Os0554#DNAauto/MULE|Class_II/Subclass_1/TIR/MuDR_Mutator:MuDR-TPase;Name=MuDR_Mutator-TPase;gene=TPase;clade=MuDR_Mutator;evalue=1.1e-42;coverage=98.8;probability=0.94
Os0576#DNAauto/PILE     TEsorter        CDS     3255    3674    147.7   -       2       ID=Os0576#DNAauto/PILE|Class_II/Subclass_1/TIR/PIF_Harbinger:Harbinger-TPase;Name=PIF_Harbinger-TPase;gene=TPase;clade=PIF_Harbinger;evalue=3.9e-45;coverage=100.0;probability=0.96

Could you provide me some your negative strand LTRs to reproduce your issue?

LeeYEAH2 commented 2 years ago

Last night, I double checked the result file and found the negative strand LTR do exist, but was falsely annotated with positive strain. Below shows how it is annotated in LTR_retriever/TEsorter: retriever: _m29_p_5:1265061..1273765 pass motif:TGCA TSD:GCAC 1265057..1265060 1273766..1273769 IN:1265484..1273342 0.9976 - unknown NA 91069 retriever_ltrlib.fa(as TEsorter inputs):

_m29_p_5:1265061..1265483_LTR#LTR/unknown tgtAAGAGACTGTTAGCATGTAAGAGACTGTTAGCATGAGAACTGCTATCTTCAGGTCAGGGTTAACAGCCCCTAGCCTGAAACAACATAATTATAAAATTCCAGGATGTGGGCAGTTGCAGCATGGCGACTAGAAGTCCTGTAGCTTATCTTATACTCCTGGTCTCCTTGTCCTGTAGTAAATCTTATACTCTTTGAAACTATGTTAGTCCTGTAAGTTTATCTTGTACTCTCAGAAGCTATGGAAGGccttgaaaatgctatataaccccttggctttaagtgtttggggtccttgttaaaaacccgctgcgtcgggcagagacggggaccccagccggctggtaataaacctcgctgtgtgacttgcattattgtgcgggttctctgtctgtctgagggggacaattccggatcttaaca _m29_p_5:1265484..1273342_INT#LTR/unknown tttgggGGCTCGTCCGGGATGCCCCCCCTCAGGGGAACAGACAACTCCCGATACCGAGGTTTACATAGGGAACCGCGGAAGGAGGTTTGGCCACCTCCAATAGGGGAGGACTGAAACAGGTCCTCGACCGGAGAGACTGAAACAGGTCTCTGCCCGGTCCCAAACAGGGGAGGACTGAAACAGGTCCTCGACCGGAGAGACTGAAACAGGTCTCTGCCCGGTCCCAAACAGGGGAGGACTGAAACAGGTCCTCGACCGGAGAGACTGAAACAGGTCTctgcccggtcccagttgagaattctcacgggaaggaaagtgcgggttacgaccctggaggcttcgttacttgaagtcgtgggagacgtccccgacgagaaggtggcccagcgacgtccacagtggacgccgtttggacgggccctaggtaaggatcttgagcgtggtgagactggtgtgattgaaacggcgcctgtggatgtggtgtggttgaccggccggtgggtgttgaagagtcctgtgcgaagttgtgtatttgctggcactgtgtcttttgtcctttcttttgccttttcttggtttctctttgtgacaattatgggacaggttcaggtaacacctaagaccctgctcctgaaccactttcctgaaatccgcgccaaggctcgtaatcatggtgtggaagtgaagaaaggtaagtttgatacattctgctctgcagaatggcctacttttaatgtgggctggcccccccagggaactttttccctagacattattaagaaggtccgagatattattaatcggcgccatccggaccaatatccctatattttgatgtggcaagccttagtagagagtcctccctcctggcttaagccctttatccctgacaagccagaagatccccccctcccccttaaagtcctgaccgtttcgggaccctcccgccagtcggcagtgcccacggccggcccgaaacccccggagaagccgacccagggacccatccttcaggaggggtcagacatatacccctccctgatagacctagatctggaagagaccccccctccttacgctccggtggcgccgctccagccgcggcgcgcgcccagcccgatggcgccgctcccgcctgaggctgccgctccttccgagctgtctcactctaccgcctccccaatggctccccctcccccgggttcccctgccccagctcaagggccagcgaggggattgaggcctcgcagacgccgggaagagaccccagaggaggaaccgtcctcctccacctccgctggggcgccgattctccccgtgcgagcactaggaggaactggtccagatggggagcgagcataccagtactggcctttttctagcagtgatctgtacaactggaaggctcaaaaccctcctttttctgaggacccgaaaggcctaactgacctgttcgagtctgtcatgcacacacacagtcccacttgggatgattgccagcagcttcttaagaccttattcaccaccgaggagcgcgagcgaatcctcaccgaggccagaaaaaatgtccccggcgacaacggaagacccacgaccttgccgaacctgatagatgagcgctttcccctgaatagaccggattgggactttgggaacgcagaaggtagggagcgtctccgagtctaccgccagactcttatggcaggtctccgagcggcggcacgccgccccaccaatttggccaaggtaaaagctataatgcaaggggataatgaaagcccggccgtgtttttagaacgcctctatgatgcttacagacagtacaccccgttggaccccctggcagaggaaaaccagtcggctgtaattatgtcctttataaaccaggctgccccagatattaggaagaaattgtacaaacaggagggactgggagaaatgtctatccgggatttaatgaaagtagcggagagagtcttcaacactcgagagactcccgaagaaagggaggatagaattagaaaagaaaatcaggaattacagaaacgaatcaggaaggaagacagagagcatcagagtagggagaacaggaggcagcagagggagatggccaagatcttgttggcaggcgtgcaaagcacagtcagggtgggaccgagtccggccggaccagcccgaccgtggagaccgcggccccgactggataggggacagtgtgcaaactgcaaagagtatggacattggaagagggagtgccccaagcgccagggccaaacagggcaagacgcacgggtcctgctggcggggatggagagtgactaggggagacgggactcggatcccctccccgagtcttgggtaactgcgtatgtggaggggaagccagtaggattcctggtagacacaggagcccagtactcagttttgaataagcccacagagcccttatctcagaaaaccagtttggtgcaaggggcaactgggtccaaggcttatcggtggactagtaggcgccaagtagacttaggtcgccaccaagtgacccactccttcctagttatccctgaatgccctgcccccttactggggcgcgatctcctgactaagatcagggctcagatccattttgagccggatggcattaagctattggatggccaaggacagcccctccacattttgaccctgtctcttgtggatgaacatcgcctgttcgccctgcaggacaacccctacaaccctccctctacagaatggccccgtgatatggattattggcttaaaacataccctcaggcgtgggcggaaatagcgggtgtgggccgggcggcccgccgagcaccagtagtggtggaacttaaagcctcagcccagcctatccggatccgccagtaccccatgtctgcagaggcgcggaaagggattgccccgcacattaaccgtttactggaagctggaatactgaaaccttgccattctgcctggaacaccccacttctccccgttaagaaaccggggggaaaagattataggccagtccaggacttgagggaagtaaataagagggttgaagacatccatcccacggtccccaacccttataccttactaagtcacttgcccccttcacatgtctggtatactaccttagacctaaaggatgcgttttttagcatagccctggcacccagcagccaacacatttttgccttcgaatggaatgatggcaatacgggaacccccgggcagctgacctggactagactaccgcaaggcttcaaaaactctccaactctgtttaatgaagccctaaatcaggatttggactcgtttcgccagagccataattcagttacgctcctgcagtacgtagatgacttgcttctggcagccccctccgaagctgaatgccgacaggccactggagacctcctccaggagctggggcagttgggctatcgggccagtgcaaagaaggctcaaatatgcaggcaaacagtcacctacctggggtataaactaaaagaaggagccagatggctgacagaggccatgaaagagactattcttagacttccagtcccgacctcagcacgagaggtccgtgagtttttagggacgacaggctactgccggctgtggattttggggtatgctgaaatagcaaaacctctgtatgaggcaaccaaggataaggtcccttgggcctgggggtcagaccaacagaaggcctacgatgaactcaaggtcgctctcctaagagccccggctctggcattgccagaccccctgaagcccttcactctctttgttgatgagaggaggggaatagcgaaaggggtgctaatgcagcgtctggggccctggaaacgcccggttgcctatttatccaagaagctagatccagttgcagcaggatggcccccgtgcttaaggatcattgcggcagtagccctaatggtgaaggatgctgataaactcacttttgggcaacatctgaaggtagtaaccccccatgcgatcgagggggtcctgaaatatcctcctggtaggtggatgactaatgcccgactaacacattaccaaggactcttgctagatgcaccccggatcatcttcgctgaacccaccgctctgaatccagccaccctgctgccgaccccggatctgagagctcccctgcatgactgccaagagatcatggcagaagtcacccaggtgcgccccgacctccaggacaccgcactacccaacagtgagttggtatggtacactgatggaagcagcttcgttatagatggtgtgcggagggcaggcgcagcggtggtagaccaagggggaaacatcatttggaatgcctcgctttccccggggacatcagcacagaaggccgaactgatcgcgctggcggaggcgctggaacgggccgaagggagacgagtgactgtctacaccgatagccgctacgcctttggcactgtccatgtgcatggcgctatctaccgggaaagaggctttgttacagcggaaggaaagactctgcgcaatcttcctgaggtacgaagactgctgatggctgtgcaaatgccccgggcagtcgcagttgtccacatccctgggcaccagtctgcccagaccccggaagctgaaggaaaccggcgagcggatgaagccgccaaggcagtggcagtagcttcatcagctttagcactcaccctgcccacacctgagctccctcgcctgcccccgcgacctgactacactccagaagacctgcgatggatccagaaccaccactgcccggaatctgatcagcaggggtggcatcgggatacagaaggaagattgatactgccggcacagctaggactgtttcttctctccaacctgcatcaagccacccacttaggaaaaaagaagttgctgacaattctcgagtccgcccgcctccggtttccccgacaagcagctcagattcaagagattgtagatcagtgcattgggtgccaggctatgagacccagtaggaaaggaccccaacatacaggtacgagggtacggggaagagcgccgggacggagttgggaagtggattttactgaggtaaagcctgggaggtatgggtataagtacttgctagtaatggttgacacattttcgggctgggtggaagccttccccacgaaacgggagactgcccaagtggttgctaaggcattactagaagaaattattcccagatatggggttcctgaggttttaggctccgataacggcccagctttcatcagtaacgtcctacagggactagcccgagcgatagggatcaattggaagttacattgtgaatataatccccagagctcagggcaggtagagagaatgaatcggactctaaaggagaccttgtccaaactagccatcgagactggcggggactgggtgaccctcttaccctatgccatcttccgggtccggaactcaccatatgtacatggtttgacacctttcgaaattctgtatggggcaccaccccccattattgttcgtactctaccagatcatgaccccaatgtggccccaagttatctggccagtttaaaggccctacaaggggtccaacatgagatatggcccctagtgagttccctgtatgaaattaaggacgccccgaacccggaacatggcatcgttccaggggattgggtatgggtaaggagacacaggtcccggacactggaggaaagatggaaaggtccttatgtggttattctggttacccccactgccttaaaggttgacggcattgggccttgggtccatcactctcacgtgcgccgagccagccagctggagaagacgcaagctaaggagtggatcgtacggcgacaccctgataaccctctgaagctgcagctctcccgacctcggggaagcgtcaagccccctgcctcagctaacgatggaatggctgctagccctaactctgctcaacatctgggagaagagccacgcggggatcaactcacaccaaccccataagctaacatggaccctaacagatggacagacccaaacaacccttaatagcaccacacatactgcccccatcaatacgtggtggccagacttgtttttcgacctgcgtgacattttcggcactaaacgtggacggcagtatgactactcagtcaggtccaagcgggcggtaattgacacctctcaaggacatagtgcacaagggttttgggcctgcccagggaacctaagaaacaattagaaaacctgtggcggcccagaccgctactattgtggtagttggagttgtgtcacctcctatgatgggccccgacagtgggacgttgggaacagggatctagttaaattctcctttagggaCCCCCACAACCGGGTGCCCCAGGTACGTGTCCAGTTTAACCAAGACGTGGCGCGAAGAGAGCGTGGTTGGTTATCAGGATTAACTTGGGGGTTCCAATTAGATATAGGCCGTTGGGCATGGATAGGTCCCCACCCCGGCGGTCTCCTAACTATTCGACTATCGGTGGAAATGATCAGCACTCAGGTGGGTCCAAATAAGGTGCTGGCCCCTCTCGTCCCTACCAAGAACCCAGGTATATCAAGGGATAAAAACACCGCAGGAGGGACTGCGGGAACCCAGCCAAAGACCTCTGTAACTCCTTTGACGCCTGCTACGCAGccaaccaaagactcattgcggaaactggtgcgcactgtatacgagacccttaatgccaccagtcctaacctcacaacctcctgttggctgtgctatgatataaagcccccgttctatgaggcaataggacttaatgccacttacaacgcctctaacggtaagaacccttctcagtgttcatggggaaatcgtaaaattggcttaaccatgcaactagtgagcggcaatgggacctgtctagggaaggtgccccaggctaaacaaagtttatgtgcctccatagacagctcccctagttggaaaagtgacactaagtggttaatccccagaactgatggatggtggatatgttcaaagactggcctcaccccgtgcttatctacctcggtctttaatgccgccaatgaattctgtgtcctagtaacagtgctgccccgcatcctctatcaccctgaggagagtatgtattcgcattgggatagtgacacaagtatgagaagtaaaagagagcccatcaccgcactaaccattgccaccctgttcagtttgggaatagccggagctgggaccggcatagcttccctggctactcaacaatcaggaatgacctccctaagggcggccatagatgaggacatagaaaggttggaaacctcgattagtcatttagagaagtcgctcacctctctatccgaggtagtactccagaatagaagaggacttgatttgttgttcctccaacaagggggactgtgtgccgcgctgggggaggaatgttgtttttatgctgaccatactggtgtggtaaaagaatccatggcaaaggtgagagaagggttagctaagagaaaacgggaaagggaagctcaggagaactggtttgaggcttggtttaacagatctccttggctcaccaccttagtatctaccttagtgggcccaattatcttgcttgtgcttattctaaccttgggcccttgcatattaaacaagcttattaattttgtaaaagatcgtgttaatactgtccagctcatggtcctaagacaacagtatgagacagtgcccacccgtgaggacctctacggctggcccgtacatgagcaagattcctcattatgaacaacacacaggggggaaa _m29_p_5:1273343..1273765_LTR#LTR/unknown TGTAAGAGACTGTTAGCATGTAAGAGACTGTTAGCATGAGAACTGCTATCTTCAGGTCAGGGTTAACAGCCCCTAGCCTGAAACAACATAATTATAAAATTCCAGGATGTGGGCAGTTGCAGCATGGCGACTAGAAGTCCTGTAGCTTATCTTATACTCCTGGTCTCCTTGTCCTGTAGTAAATCTTATACTCTTTGAAACTATGTAAGTCCTGTAAGTTTATCTTGTACTCTCAGAAGCTATGGAAGGccttgaaaatgctatataaccccttggctttaagtgtttggggtccttgttaaaaacccgctgcgtcgggcagagacggggaccccagccggctggtaataaacctcgctgtgtgacttgcattattgtgcgggttctctgtctgtctgagggggacaattccggatcttaaca TEsorter_csvoutputs: _m29_p_5:1265484..1273342_INT#LTR/unknown LTR Retroviridae gammaretroviridae yes + GAG|gammaretroviridae AP|gammaretroviridae RT|gammaretroviridae RNaseH|gammaretroviridae INT|gammaretroviridae ENV|gammaretroviridae VAP|badnavirus gff3_outputs:_m29_p_5 TEsorter CDS 1266111 1267739 768.7 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|GAG_gammaretroviridae;gene=GAG;clade=gammaretroviridae;evalue=1.8e-232;coverage=99.8;probability=0.84 _m29_p_5 TEsorter CDS 1267881 1268105 91.7 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|AP_gammaretroviridae;gene=AP;clade=gammaretroviridae;evalue=3.5e-28;coverage=100.0;probability=0.88 _m29_p_5 TEsorter CDS 1268430 1269140 571.0 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|RT_gammaretroviridae;gene=RT;clade=gammaretroviridae;evalue=3.6e-173;coverage=100.0;probability=1.0 _m29_p_5 TEsorter CDS 1269741 1270163 261.6 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|RNaseH_gammaretroviridae;gene=RNaseH;clade=gammaretroviridae;evalue=1.2e-79;coverage=98.6;probability=0.98 _m29_p_5 TEsorter CDS 1270374 1271333 580.7 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|INT_gammaretroviridae;gene=INT;clade=gammaretroviridae;evalue=4.5e-176;coverage=99.4;probability=0.96 _m29_p_5 TEsorter CDS 1272217 1273209 483.8 + 1 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|ENV_gammaretroviridae;gene=ENV;clade=gammaretroviridae;evalue=1.2e-146;coverage=99.0;probability=0.92 _m29_p_5 TEsorter CDS 1272826 1272915 9.4 + 1 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|VAP_badnavirus;gene=VAP;clade=badnavirus;evalue=0.0066;coverage=18.0;probability=0.96

zhangrengang commented 2 years ago

Hi, I have checked the element:

_m29_p_5        TEsorter        CDS     1267878 1268072 0.28    +       0       ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Ty3_gypsy:Ty3-PROT;Name=Ty3_gypsy-PROT;gene=PROT;clade=Ty3_gypsy;coverage=100.0;evalue=7.8e-06;probability=0.86
_m29_p_5        TEsorter        CDS     1268367 1268999 1.24    +       0       ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-RT;Name=Retrovirus-RT;gene=RT;clade=Retrovirus;coverage=100.0;evalue=4.3e-79;probability=0.98
_m29_p_5        TEsorter        CDS     1269750 1270151 0.9     +       0       ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-RH;Name=Retrovirus-RH;gene=RH;clade=Retrovirus;coverage=100.0;evalue=6.6e-35;probability=0.99
_m29_p_5        TEsorter        CDS     1270374 1271246 1.05    +       0       ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-INT;Name=Retrovirus-INT;gene=INT;clade=Retrovirus;coverage=100.0;evalue=1.4e-78;probability=0.96

And its initial hits with protein domains (*.domtbl) are also in positive strand:

#                                                                                                                                                  --- full sequence --- -------------- this domain -------------   hmm coord   ali coord   env coord
# target name                                                     accession   tlen query name                                    accession   qlen   E-value  score  bias   #  of  c-Evalue  i-Evalue  score  bias  from    to  from    to  from    to  acc description of target
#                                             ------------------- ---------- -----                          -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------
Class_I/LTR/Retrovirus:Retrovirus-RT                              -            209 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 -           2619   2.4e-79  259.4   0.0   1   1   1.3e-79   4.3e-79  258.5   0.0     1   209   962  1172   962  1172 0.98 -
Class_I/LTR/Retrovirus:Retrovirus-INT                             -            245 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 -           2619     8e-79  258.5   0.0   1   1   4.1e-79   1.4e-78  257.8   0.0     1   245  1631  1921  1631  1921 0.96 -
Class_I/LTR/Ty3_gypsy:Ty3-INT                                     -            222 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 -           2619   7.3e-49  160.2   0.0   1   1   3.6e-49   1.2e-48  159.5   0.0     1   217  1631  1845  1631  1849 0.94 -
Class_I/LTR/Ty3_gypsy/non-chromovirus/non-chromo-outgroup:Ty3-INT -            300 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 -           2619   5.4e-47  154.4   0.0   1   1   8.2e-47   2.7e-46  152.1   0.0     1   300  1631  1921  1631  1921 0.91 -
Class_I/LTR/Ty3_gypsy/chromovirus/chromo-outgroup:Ty3-INT         -            305 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 -           2619   2.7e-37  122.3   0.1   1   1   2.9e-37   9.6e-37  120.5   0.0     5   287  1632  1909  1629  1921 0.80 -
Class_I/LTR/Retrovirus:Retrovirus-RH                              -            126 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 -           2619     4e-35  114.6   0.0   1   1     2e-35   6.6e-35  113.9   0.0     1   126  1423  1556  1423  1556 0.99 -
Class_I/LTR/Ty3_gypsy/non-chromovirus/OTA/Athila:Ty3-INT          -            313 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 -           2619   7.6e-35  114.5   0.0   1   1   3.6e-35   1.2e-34  113.8   0.0     7   217  1634  1843  1629  1850 0.93 -
....

Then I make a revserse-comeplent sequence:

>_m29_p_5:1265484..1273342_INT#LTR/unknown
tttcccccctgtgtgttgttcataatgaggaatcttgctcatgtacgggccagccgtagaggtcctcacgggtgggcactgtctcatactgttgtcttaggaccatgagctggacagtattaacacgatcttttacaaaattaataagcttgtttaatatgcaagggcccaaggttagaataagcacaagcaagataattgggcccactaaggtagatactaaggtggtgagccaaggagatctgttaaaccaagcctcaaaccagttctcctgagcttccctttcccgttttctcttagctaacccttctctcacctttgccatggattcttttaccacaccagtatggtcagcataaaaacaacattcctcccccagcgcggcacacagtcccccttgttggaggaacaacaaatcaagtcctcttctattctggagtactacctcggatagagaggtgagcgacttctctaaatgactaatcgaggtttccaacctttctatgtcctcatctatggccgcccttagggaggtcattcctgattgttgagtagccagggaagctatgccggtcccagctccggctattcccaaactgaacagggtggcaatggttagtgcggtgatgggctctcttttacttctcatacttgtgtcactatcccaatgcgaatacatactctcctcagggtgatagaggatgcggggcagcactgttactaggacacagaattcattggcggcattaaagaccgaggtagataagcacggggtgaggccagtctttgaacatatccaccatccatcagttctggggattaaccacttagtgtcacttttccaactaggggagctgtctatggaggcacataaactttgtttagcctggggcaccttccctagacaggtcccattgccgctcactagttgcatggttaagccaattttacgatttccccatgaacactgagaagggttcttaccgttagaggcgttgtaagtggcattaagtcctattgcctcatagaacgggggctttatatcatagcacagccaacaggaggttgtgaggttaggactggtggcattaagggtctcgtatacagtgcgcaccagtttccgcaatgagtctttggttggCTGCGTAGCAGGCGTCAAAGGAGTTACAGAGGTCTTTGGCTGGGTTCCCGCAGTCCCTCCTGCGGTGTTTTTATCCCTTGATATACCTGGGTTCTTGGTAGGGACGAGAGGGGCCAGCACCTTATTTGGACCCACCTGAGTGCTGATCATTTCCACCGATAGTCGAATAGTTAGGAGACCGCCGGGGTGGGGACCTATCCATGCCCAACGGCCTATATCTAATTGGAACCCCCAAGTTAATCCTGATAACCAACCACGCTCTCTTCGCGCCACGTCTTGGTTAAACTGGACACGTACCTGGGGCACCCGGTTGTGGGGGtccctaaaggagaatttaactagatccctgttcccaacgtcccactgtcggggcccatcataggaggtgacacaactccaactaccacaatagtagcggtctgggccgccacaggttttctaattgtttcttaggttccctgggcaggcccaaaacccttgtgcactatgtccttgagaggtgtcaattaccgcccgcttggacctgactgagtagtcatactgccgtccacgtttagtgccgaaaatgtcacgcaggtcgaaaaacaagtctggccaccacgtattgatgggggcagtatgtgtggtgctattaagggttgtttgggtctgtccatctgttagggtccatgttagcttatggggttggtgtgagttgatccccgcgtggctcttctcccagatgttgagcagagttagggctagcagccattccatcgttagctgaggcagggggcttgacgcttccccgaggtcgggagagctgcagcttcagagggttatcagggtgtcgccgtacgatccactccttagcttgcgtcttctccagctggctggctcggcgcacgtgagagtgatggacccaaggcccaatgccgtcaacctttaaggcagtgggggtaaccagaataaccacataaggacctttccatctttcctccagtgtccgggacctgtgtctccttacccatacccaatcccctggaacgatgccatgttccgggttcggggcgtccttaatttcatacagggaactcactaggggccatatctcatgttggaccccttgtagggcctttaaactggccagataacttggggccacattggggtcatgatctggtagagtacgaacaataatggggggtggtgccccatacagaatttcgaaaggtgtcaaaccatgtacatatggtgagttccggacccggaagatggcatagggtaagagggtcacccagtccccgccagtctcgatggctagtttggacaaggtctcctttagagtccgattcattctctctacctgccctgagctctggggattatattcacaatgtaacttccaattgatccctatcgctcgggctagtccctgtaggacgttactgatgaaagctgggccgttatcggagcctaaaacctcaggaaccccatatctgggaataatttcttctagtaatgccttagcaaccacttgggcagtctcccgtttcgtggggaaggcttccacccagcccgaaaatgtgtcaaccattactagcaagtacttatacccatacctcccaggctttacctcagtaaaatccacttcccaactccgtcccggcgctcttccccgtaccctcgtacctgtatgttggggtcctttcctactgggtctcatagcctggcacccaatgcactgatctacaatctcttgaatctgagctgcttgtcggggaaaccggaggcgggcggactcgagaattgtcagcaacttcttttttcctaagtgggtggcttgatgcaggttggagagaagaaacagtcctagctgtgccggcagtatcaatcttccttctgtatcccgatgccacccctgctgatcagattccgggcagtggtggttctggatccatcgcaggtcttctggagtgtagtcaggtcgcgggggcaggcgagggagctcaggtgtgggcagggtgagtgctaaagctgatgaagctactgccactgccttggcggcttcatccgctcgccggtttccttcagcttccggggtctgggcagactggtgcccagggatgtggacaactgcgactgcccggggcatttgcacagccatcagcagtcttcgtacctcaggaagattgcgcagagtctttccttccgctgtaacaaagcctctttcccggtagatagcgccatgcacatggacagtgccaaaggcgtagcggctatcggtgtagacagtcactcgtctcccttcggcccgttccagcgcctccgccagcgcgatcagttcggccttctgtgctgatgtccccggggaaagcgaggcattccaaatgatgtttcccccttggtctaccaccgctgcgcctgccctccgcacaccatctataacgaagctgcttccatcagtgtaccataccaactcactgttgggtagtgcggtgtcctggaggtcggggcgcacctgggtgacttctgccatgatctcttggcagtcatgcaggggagctctcagatccggggtcggcagcagggtggctggattcagagcggtgggttcagcgaagatgatccggggtgcatctagcaagagtccttggtaatgtgttagtcgggcattagtcatccacctaccaggaggatatttcaggaccccctcgatcgcatggggggttactaccttcagatgttgcccaaaagtgagtttatcagcatccttcaccattagggctactgccgcaatgatccttaagcacgggggccatcctgctgcaactggatctagcttcttggataaataggcaaccgggcgtttccagggccccagacgctgcattagcacccctttcgctattcccctcctctcatcaacaaagagagtgaagggcttcagggggtctggcaatgccagagccggggctcttaggagagcgaccttgagttcatcgtaggccttctgttggtctgacccccaggcccaagggaccttatccttggttgcctcatacagaggttttgctatttcagcataccccaaaatccacagccggcagtagcctgtcgtccctaaaaactcacggacctctcgtgctgaggtcgggactggaagtctaagaatagtctctttcatggcctctgtcagccatctggctccttcttttagtttataccccaggtaggtgactgtttgcctgcatatttgagccttctttgcactggcccgatagcccaactgccccagctcctggaggaggtctccagtggcctgtcggcattcagcttcggagggggctgccagaagcaagtcatctacgtactgcaggagcgtaactgaattatggctctggcgaaacgagtccaaatcctgatttagggcttcattaaacagagttggagagtttttgaagccttgcggtagtctagtccaggtcagctgcccgggggttcccgtattgccatcattccattcgaaggcaaaaatgtgttggctgctgggtgccagggctatgctaaaaaacgcatcctttaggtctaaggtagtataccagacatgtgaagggggcaagtgacttagtaaggtataagggttggggaccgtgggatggatgtcttcaaccctcttatttacttccctcaagtcctggactggcctataatcttttccccccggtttcttaacggggagaagtggggtgttccaggcagaatggcaaggtttcagtattccagcttccagtaaacggttaatgtgcggggcaatccctttccgcgcctctgcagacatggggtactggcggatccggataggctgggctgaggctttaagttccaccactactggtgctcggcgggccgcccggcccacacccgctatttccgcccacgcctgagggtatgttttaagccaataatccatatcacggggccattctgtagagggagggttgtaggggttgtcctgcagggcgaacaggcgatgttcatccacaagagacagggtcaaaatgtggaggggctgtccttggccatccaatagcttaatgccatccggctcaaaatggatctgagccctgatcttagtcaggagatcgcgccccagtaagggggcagggcattcagggataactaggaaggagtgggtcacttggtggcgacctaagtctacttggcgcctactagtccaccgataagccttggacccagttgccccttgcaccaaactggttttctgagataagggctctgtgggcttattcaaaactgagtactgggctcctgtgtctaccaggaatcctactggcttcccctccacatacgcagttacccaagactcggggaggggatccgagtcccgtctcccctagtcactctccatccccgccagcaggacccgtgcgtcttgccctgtttggccctggcgcttggggcactccctcttccaatgtccatactctttgcagtttgcacactgtcccctatccagtcggggccgcggtctccacggtcgggctggtccggccggactcggtcccaccctgactgtgctttgcacgcctgccaacaagatcttggccatctccctctgctgcctcctgttctccctactctgatgctctctgtcttccttcctgattcgtttctgtaattcctgattttcttttctaattctatcctccctttcttcgggagtctctcgagtgttgaagactctctccgctactttcattaaatcccggatagacatttctcccagtccctcctgtttgtacaatttcttcctaatatctggggcagcctggtttataaaggacataattacagccgactggttttcctctgccagggggtccaacggggtgtactgtctgtaagcatcatagaggcgttctaaaaacacggccgggctttcattatccccttgcattatagcttttaccttggccaaattggtggggcggcgtgccgccgctcggagacctgccataagagtctggcggtagactcggagacgctccctaccttctgcgttcccaaagtcccaatccggtctattcaggggaaagcgctcatctatcaggttcggcaaggtcgtgggtcttccgttgtcgccggggacattttttctggcctcggtgaggattcgctcgcgctcctcggtggtgaataaggtcttaagaagctgctggcaatcatcccaagtgggactgtgtgtgtgcatgacagactcgaacaggtcagttaggcctttcgggtcctcagaaaaaggagggttttgagccttccagttgtacagatcactgctagaaaaaggccagtactggtatgctcgctccccatctggaccagttcctcctagtgctcgcacggggagaatcggcgccccagcggaggtggaggaggacggttcctcctctggggtctcttcccggcgtctgcgaggcctcaatcccctcgctggcccttgagctggggcaggggaacccgggggagggggagccattggggaggcggtagagtgagacagctcggaaggagcggcagcctcaggcgggagcggcgccatcgggctgggcgcgcgccgcggctggagcggcgccaccggagcgtaaggagggggggtctcttccagatctaggtctatcagggaggggtatatgtctgacccctcctgaaggatgggtccctgggtcggcttctccgggggtttcgggccggccgtgggcactgccgactggcgggagggtcccgaaacggtcaggactttaagggggagggggggatcttctggcttgtcagggataaagggcttaagccaggagggaggactctctactaaggcttgccacatcaaaatatagggatattggtccggatggcgccgattaataatatctcggaccttcttaataatgtctagggaaaaagttccctgggggggccagcccacattaaaagtaggccattctgcagagcagaatgtatcaaacttacctttcttcacttccacaccatgattacgagccttggcgcggatttcaggaaagtggttcaggagcagggtcttaggtgttacctgaacctgtcccataattgtcacaaagagaaaccaagaaaaggcaaaagaaaggacaaaagacacagtgccagcaaatacacaacttcgcacaggactcttcaacacccaccggccggtcaaccacaccacatccacaggcgccgtttcaatcacaccagtctcaccacgctcaagatccttacctagggcccgtccaaacggcgtccactgtggacgtcgctgggccaccttctcgtcggggacgtctcccacgacttcaagtaacgaagcctccagggtcgtaacccgcactttccttcccgtgagaattctcaactgggaccgggcagAGACCTGTTTCAGTCTCTCCGGTCGAGGACCTGTTTCAGTCCTCCCCTGTTTGGGACCGGGCAGAGACCTGTTTCAGTCTCTCCGGTCGAGGACCTGTTTCAGTCCTCCCCTGTTTGGGACCGGGCAGAGACCTGTTTCAGTCTCTCCGGTCGAGGACCTGTTTCAGTCCTCCCCTATTGGAGGTGGCCAAACCTCCTTCCGCGGTTCCCTATGTAAACCTCGGTATCGGGAGTTGTCTGTTCCCCTGAGGGGGGGCATCCCGGACGAGCCcccaaa

Now it is the negative strand:

_m29_p_5        TEsorter        CDS     1267580 1268452 1.05    -       0       ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-INT;Name=Retrovirus-INT;gene=INT;clade=Retrovirus;coverage=100.0;evalue=1.4e-78;probability=0.96
_m29_p_5        TEsorter        CDS     1268675 1269076 0.9     -       0       ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-RH;Name=Retrovirus-RH;gene=RH;clade=Retrovirus;coverage=100.0;evalue=6.6e-35;probability=0.99
_m29_p_5        TEsorter        CDS     1269827 1270459 1.24    -       0       ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-RT;Name=Retrovirus-RT;gene=RT;clade=Retrovirus;coverage=100.0;evalue=4.3e-79;probability=0.98
_m29_p_5        TEsorter        CDS     1270754 1270948 0.28    -       0       ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Ty3_gypsy:Ty3-PROT;Name=Ty3_gypsy-PROT;gene=PROT;clade=Ty3_gypsy;coverage=100.0;evalue=7.8e-06;probability=0.86

And *.domtbl hits to rev_aa:

#                                                                                                     --- full sequence --- -------------- this domain -------------   hmm coord   ali coord   env coord
# target name        accession   tlen query name                                    accession   qlen   E-value  score  bias   #  of  c-Evalue  i-Evalue  score  bias  from    to  from    to  from    to  acc description of target
#------------------- ---------- -----                          -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------
Class_I/LTR/Retrovirus:Retrovirus-RT                              -            209 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 -           2619   2.4e-79  259.4   0.0   1   1   1.3e-79   4.3e-79  258.5   0.0     1   209   962  1172   962  1172 0.98 -
Class_I/LTR/Retrovirus:Retrovirus-INT                             -            245 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 -           2619     8e-79  258.5   0.0   1   1   4.1e-79   1.4e-78  257.8   0.0     1   245  1631  1921  1631  1921 0.96 -
Class_I/LTR/Ty3_gypsy:Ty3-INT                                     -            222 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 -           2619   7.3e-49  160.2   0.0   1   1   3.6e-49   1.2e-48  159.5   0.0     1   217  1631  1845  1631  1849 0.94 -
Class_I/LTR/Ty3_gypsy/non-chromovirus/non-chromo-outgroup:Ty3-INT -            300 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 -           2619   5.4e-47  154.4   0.0   1   1   8.2e-47   2.7e-46  152.1   0.0     1   300  1631  1921  1631  1921 0.91 -
Class_I/LTR/Ty3_gypsy/chromovirus/chromo-outgroup:Ty3-INT         -            305 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 -           2619   2.7e-37  122.3   0.1   1   1   2.9e-37   9.6e-37  120.5   0.0     5   287  1632  1909  1629  1921 0.80 -
Class_I/LTR/Retrovirus:Retrovirus-RH                              -            126 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 -           2619     4e-35  114.6   0.0   1   1     2e-35   6.6e-35  113.9   0.0     1   126  1423  1556  1423  1556 0.99 -
Class_I/LTR/Ty3_gypsy/non-chromovirus/OTA/Athila:Ty3-INT          -            313 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 -           2619   7.6e-35  114.5   0.0   1   1   3.6e-35   1.2e-34  113.8   0.0     7   217  1634  1843  1629  1850 0.93 -

So I think the seqence you provided indeed is in positive strand. Do you have a check whether LTR_retriever has reversed it?

zhangrengang commented 2 years ago

Or you can try extract the LTR sequences from LTR_retriever results by using our method in https://github.com/zhangrengang/TEsorter#extracting-te-sequences-from-genome-for-tesorter.

LeeYEAH2 commented 2 years ago

wow, I followed your advice and yes! LTR_retriever did reverse this sequence. That's quite confusing, this problem has been stuck with me for a long time. I think I'll try your method to see if there could be any improvement. Thx a lot.