Closed LeeYEAH2 closed 2 years ago
Hi, there are some lines in negative strand:
Os0494#DNAauto/CACTA TEsorter CDS 7694 8968 630.4 - 1 ID=Os0494#DNAauto/CACTA|Class_II/Subclass_1/TIR/EnSpm_CACTA:CACTA-TPase;Name=EnSpm_CACTA-TPase;gene=TPase;clade=EnSpm_CACTA;evalue=1.9e-191;coverage=98.8;probability=0.98
Os0503_INT#LTR/Gypsy TEsorter CDS 578 946 137.2 + 1 ID=Os0503_INT#LTR/Gypsy|Class_I/LTR/Ty3_gypsy/non-chromovirus/OTA/Tat/Ogre:Ty3-GAG;Name=Ogre-GAG;gene=GAG;clade=Ogre;evalue=3.2e-42;coverage=100.0;probability=0.98
Os0554#DNAauto/MULE TEsorter CDS 5689 6225 139.7 + 0 ID=Os0554#DNAauto/MULE|Class_II/Subclass_1/TIR/MuDR_Mutator:MuDR-TPase;Name=MuDR_Mutator-TPase;gene=TPase;clade=MuDR_Mutator;evalue=1.1e-42;coverage=98.8;probability=0.94
Os0576#DNAauto/PILE TEsorter CDS 3255 3674 147.7 - 2 ID=Os0576#DNAauto/PILE|Class_II/Subclass_1/TIR/PIF_Harbinger:Harbinger-TPase;Name=PIF_Harbinger-TPase;gene=TPase;clade=PIF_Harbinger;evalue=3.9e-45;coverage=100.0;probability=0.96
Could you provide me some your negative strand LTRs to reproduce your issue?
Last night, I double checked the result file and found the negative strand LTR do exist, but was falsely annotated with positive strain. Below shows how it is annotated in LTR_retriever/TEsorter: retriever: _m29_p_5:1265061..1273765 pass motif:TGCA TSD:GCAC 1265057..1265060 1273766..1273769 IN:1265484..1273342 0.9976 - unknown NA 91069 retriever_ltrlib.fa(as TEsorter inputs):
_m29_p_5:1265061..1265483_LTR#LTR/unknown tgtAAGAGACTGTTAGCATGTAAGAGACTGTTAGCATGAGAACTGCTATCTTCAGGTCAGGGTTAACAGCCCCTAGCCTGAAACAACATAATTATAAAATTCCAGGATGTGGGCAGTTGCAGCATGGCGACTAGAAGTCCTGTAGCTTATCTTATACTCCTGGTCTCCTTGTCCTGTAGTAAATCTTATACTCTTTGAAACTATGTTAGTCCTGTAAGTTTATCTTGTACTCTCAGAAGCTATGGAAGGccttgaaaatgctatataaccccttggctttaagtgtttggggtccttgttaaaaacccgctgcgtcgggcagagacggggaccccagccggctggtaataaacctcgctgtgtgacttgcattattgtgcgggttctctgtctgtctgagggggacaattccggatcttaaca _m29_p_5:1265484..1273342_INT#LTR/unknown tttgggGGCTCGTCCGGGATGCCCCCCCTCAGGGGAACAGACAACTCCCGATACCGAGGTTTACATAGGGAACCGCGGAAGGAGGTTTGGCCACCTCCAATAGGGGAGGACTGAAACAGGTCCTCGACCGGAGAGACTGAAACAGGTCTCTGCCCGGTCCCAAACAGGGGAGGACTGAAACAGGTCCTCGACCGGAGAGACTGAAACAGGTCTCTGCCCGGTCCCAAACAGGGGAGGACTGAAACAGGTCCTCGACCGGAGAGACTGAAACAGGTCTctgcccggtcccagttgagaattctcacgggaaggaaagtgcgggttacgaccctggaggcttcgttacttgaagtcgtgggagacgtccccgacgagaaggtggcccagcgacgtccacagtggacgccgtttggacgggccctaggtaaggatcttgagcgtggtgagactggtgtgattgaaacggcgcctgtggatgtggtgtggttgaccggccggtgggtgttgaagagtcctgtgcgaagttgtgtatttgctggcactgtgtcttttgtcctttcttttgccttttcttggtttctctttgtgacaattatgggacaggttcaggtaacacctaagaccctgctcctgaaccactttcctgaaatccgcgccaaggctcgtaatcatggtgtggaagtgaagaaaggtaagtttgatacattctgctctgcagaatggcctacttttaatgtgggctggcccccccagggaactttttccctagacattattaagaaggtccgagatattattaatcggcgccatccggaccaatatccctatattttgatgtggcaagccttagtagagagtcctccctcctggcttaagccctttatccctgacaagccagaagatccccccctcccccttaaagtcctgaccgtttcgggaccctcccgccagtcggcagtgcccacggccggcccgaaacccccggagaagccgacccagggacccatccttcaggaggggtcagacatatacccctccctgatagacctagatctggaagagaccccccctccttacgctccggtggcgccgctccagccgcggcgcgcgcccagcccgatggcgccgctcccgcctgaggctgccgctccttccgagctgtctcactctaccgcctccccaatggctccccctcccccgggttcccctgccccagctcaagggccagcgaggggattgaggcctcgcagacgccgggaagagaccccagaggaggaaccgtcctcctccacctccgctggggcgccgattctccccgtgcgagcactaggaggaactggtccagatggggagcgagcataccagtactggcctttttctagcagtgatctgtacaactggaaggctcaaaaccctcctttttctgaggacccgaaaggcctaactgacctgttcgagtctgtcatgcacacacacagtcccacttgggatgattgccagcagcttcttaagaccttattcaccaccgaggagcgcgagcgaatcctcaccgaggccagaaaaaatgtccccggcgacaacggaagacccacgaccttgccgaacctgatagatgagcgctttcccctgaatagaccggattgggactttgggaacgcagaaggtagggagcgtctccgagtctaccgccagactcttatggcaggtctccgagcggcggcacgccgccccaccaatttggccaaggtaaaagctataatgcaaggggataatgaaagcccggccgtgtttttagaacgcctctatgatgcttacagacagtacaccccgttggaccccctggcagaggaaaaccagtcggctgtaattatgtcctttataaaccaggctgccccagatattaggaagaaattgtacaaacaggagggactgggagaaatgtctatccgggatttaatgaaagtagcggagagagtcttcaacactcgagagactcccgaagaaagggaggatagaattagaaaagaaaatcaggaattacagaaacgaatcaggaaggaagacagagagcatcagagtagggagaacaggaggcagcagagggagatggccaagatcttgttggcaggcgtgcaaagcacagtcagggtgggaccgagtccggccggaccagcccgaccgtggagaccgcggccccgactggataggggacagtgtgcaaactgcaaagagtatggacattggaagagggagtgccccaagcgccagggccaaacagggcaagacgcacgggtcctgctggcggggatggagagtgactaggggagacgggactcggatcccctccccgagtcttgggtaactgcgtatgtggaggggaagccagtaggattcctggtagacacaggagcccagtactcagttttgaataagcccacagagcccttatctcagaaaaccagtttggtgcaaggggcaactgggtccaaggcttatcggtggactagtaggcgccaagtagacttaggtcgccaccaagtgacccactccttcctagttatccctgaatgccctgcccccttactggggcgcgatctcctgactaagatcagggctcagatccattttgagccggatggcattaagctattggatggccaaggacagcccctccacattttgaccctgtctcttgtggatgaacatcgcctgttcgccctgcaggacaacccctacaaccctccctctacagaatggccccgtgatatggattattggcttaaaacataccctcaggcgtgggcggaaatagcgggtgtgggccgggcggcccgccgagcaccagtagtggtggaacttaaagcctcagcccagcctatccggatccgccagtaccccatgtctgcagaggcgcggaaagggattgccccgcacattaaccgtttactggaagctggaatactgaaaccttgccattctgcctggaacaccccacttctccccgttaagaaaccggggggaaaagattataggccagtccaggacttgagggaagtaaataagagggttgaagacatccatcccacggtccccaacccttataccttactaagtcacttgcccccttcacatgtctggtatactaccttagacctaaaggatgcgttttttagcatagccctggcacccagcagccaacacatttttgccttcgaatggaatgatggcaatacgggaacccccgggcagctgacctggactagactaccgcaaggcttcaaaaactctccaactctgtttaatgaagccctaaatcaggatttggactcgtttcgccagagccataattcagttacgctcctgcagtacgtagatgacttgcttctggcagccccctccgaagctgaatgccgacaggccactggagacctcctccaggagctggggcagttgggctatcgggccagtgcaaagaaggctcaaatatgcaggcaaacagtcacctacctggggtataaactaaaagaaggagccagatggctgacagaggccatgaaagagactattcttagacttccagtcccgacctcagcacgagaggtccgtgagtttttagggacgacaggctactgccggctgtggattttggggtatgctgaaatagcaaaacctctgtatgaggcaaccaaggataaggtcccttgggcctgggggtcagaccaacagaaggcctacgatgaactcaaggtcgctctcctaagagccccggctctggcattgccagaccccctgaagcccttcactctctttgttgatgagaggaggggaatagcgaaaggggtgctaatgcagcgtctggggccctggaaacgcccggttgcctatttatccaagaagctagatccagttgcagcaggatggcccccgtgcttaaggatcattgcggcagtagccctaatggtgaaggatgctgataaactcacttttgggcaacatctgaaggtagtaaccccccatgcgatcgagggggtcctgaaatatcctcctggtaggtggatgactaatgcccgactaacacattaccaaggactcttgctagatgcaccccggatcatcttcgctgaacccaccgctctgaatccagccaccctgctgccgaccccggatctgagagctcccctgcatgactgccaagagatcatggcagaagtcacccaggtgcgccccgacctccaggacaccgcactacccaacagtgagttggtatggtacactgatggaagcagcttcgttatagatggtgtgcggagggcaggcgcagcggtggtagaccaagggggaaacatcatttggaatgcctcgctttccccggggacatcagcacagaaggccgaactgatcgcgctggcggaggcgctggaacgggccgaagggagacgagtgactgtctacaccgatagccgctacgcctttggcactgtccatgtgcatggcgctatctaccgggaaagaggctttgttacagcggaaggaaagactctgcgcaatcttcctgaggtacgaagactgctgatggctgtgcaaatgccccgggcagtcgcagttgtccacatccctgggcaccagtctgcccagaccccggaagctgaaggaaaccggcgagcggatgaagccgccaaggcagtggcagtagcttcatcagctttagcactcaccctgcccacacctgagctccctcgcctgcccccgcgacctgactacactccagaagacctgcgatggatccagaaccaccactgcccggaatctgatcagcaggggtggcatcgggatacagaaggaagattgatactgccggcacagctaggactgtttcttctctccaacctgcatcaagccacccacttaggaaaaaagaagttgctgacaattctcgagtccgcccgcctccggtttccccgacaagcagctcagattcaagagattgtagatcagtgcattgggtgccaggctatgagacccagtaggaaaggaccccaacatacaggtacgagggtacggggaagagcgccgggacggagttgggaagtggattttactgaggtaaagcctgggaggtatgggtataagtacttgctagtaatggttgacacattttcgggctgggtggaagccttccccacgaaacgggagactgcccaagtggttgctaaggcattactagaagaaattattcccagatatggggttcctgaggttttaggctccgataacggcccagctttcatcagtaacgtcctacagggactagcccgagcgatagggatcaattggaagttacattgtgaatataatccccagagctcagggcaggtagagagaatgaatcggactctaaaggagaccttgtccaaactagccatcgagactggcggggactgggtgaccctcttaccctatgccatcttccgggtccggaactcaccatatgtacatggtttgacacctttcgaaattctgtatggggcaccaccccccattattgttcgtactctaccagatcatgaccccaatgtggccccaagttatctggccagtttaaaggccctacaaggggtccaacatgagatatggcccctagtgagttccctgtatgaaattaaggacgccccgaacccggaacatggcatcgttccaggggattgggtatgggtaaggagacacaggtcccggacactggaggaaagatggaaaggtccttatgtggttattctggttacccccactgccttaaaggttgacggcattgggccttgggtccatcactctcacgtgcgccgagccagccagctggagaagacgcaagctaaggagtggatcgtacggcgacaccctgataaccctctgaagctgcagctctcccgacctcggggaagcgtcaagccccctgcctcagctaacgatggaatggctgctagccctaactctgctcaacatctgggagaagagccacgcggggatcaactcacaccaaccccataagctaacatggaccctaacagatggacagacccaaacaacccttaatagcaccacacatactgcccccatcaatacgtggtggccagacttgtttttcgacctgcgtgacattttcggcactaaacgtggacggcagtatgactactcagtcaggtccaagcgggcggtaattgacacctctcaaggacatagtgcacaagggttttgggcctgcccagggaacctaagaaacaattagaaaacctgtggcggcccagaccgctactattgtggtagttggagttgtgtcacctcctatgatgggccccgacagtgggacgttgggaacagggatctagttaaattctcctttagggaCCCCCACAACCGGGTGCCCCAGGTACGTGTCCAGTTTAACCAAGACGTGGCGCGAAGAGAGCGTGGTTGGTTATCAGGATTAACTTGGGGGTTCCAATTAGATATAGGCCGTTGGGCATGGATAGGTCCCCACCCCGGCGGTCTCCTAACTATTCGACTATCGGTGGAAATGATCAGCACTCAGGTGGGTCCAAATAAGGTGCTGGCCCCTCTCGTCCCTACCAAGAACCCAGGTATATCAAGGGATAAAAACACCGCAGGAGGGACTGCGGGAACCCAGCCAAAGACCTCTGTAACTCCTTTGACGCCTGCTACGCAGccaaccaaagactcattgcggaaactggtgcgcactgtatacgagacccttaatgccaccagtcctaacctcacaacctcctgttggctgtgctatgatataaagcccccgttctatgaggcaataggacttaatgccacttacaacgcctctaacggtaagaacccttctcagtgttcatggggaaatcgtaaaattggcttaaccatgcaactagtgagcggcaatgggacctgtctagggaaggtgccccaggctaaacaaagtttatgtgcctccatagacagctcccctagttggaaaagtgacactaagtggttaatccccagaactgatggatggtggatatgttcaaagactggcctcaccccgtgcttatctacctcggtctttaatgccgccaatgaattctgtgtcctagtaacagtgctgccccgcatcctctatcaccctgaggagagtatgtattcgcattgggatagtgacacaagtatgagaagtaaaagagagcccatcaccgcactaaccattgccaccctgttcagtttgggaatagccggagctgggaccggcatagcttccctggctactcaacaatcaggaatgacctccctaagggcggccatagatgaggacatagaaaggttggaaacctcgattagtcatttagagaagtcgctcacctctctatccgaggtagtactccagaatagaagaggacttgatttgttgttcctccaacaagggggactgtgtgccgcgctgggggaggaatgttgtttttatgctgaccatactggtgtggtaaaagaatccatggcaaaggtgagagaagggttagctaagagaaaacgggaaagggaagctcaggagaactggtttgaggcttggtttaacagatctccttggctcaccaccttagtatctaccttagtgggcccaattatcttgcttgtgcttattctaaccttgggcccttgcatattaaacaagcttattaattttgtaaaagatcgtgttaatactgtccagctcatggtcctaagacaacagtatgagacagtgcccacccgtgaggacctctacggctggcccgtacatgagcaagattcctcattatgaacaacacacaggggggaaa _m29_p_5:1273343..1273765_LTR#LTR/unknown TGTAAGAGACTGTTAGCATGTAAGAGACTGTTAGCATGAGAACTGCTATCTTCAGGTCAGGGTTAACAGCCCCTAGCCTGAAACAACATAATTATAAAATTCCAGGATGTGGGCAGTTGCAGCATGGCGACTAGAAGTCCTGTAGCTTATCTTATACTCCTGGTCTCCTTGTCCTGTAGTAAATCTTATACTCTTTGAAACTATGTAAGTCCTGTAAGTTTATCTTGTACTCTCAGAAGCTATGGAAGGccttgaaaatgctatataaccccttggctttaagtgtttggggtccttgttaaaaacccgctgcgtcgggcagagacggggaccccagccggctggtaataaacctcgctgtgtgacttgcattattgtgcgggttctctgtctgtctgagggggacaattccggatcttaaca TEsorter_csvoutputs: _m29_p_5:1265484..1273342_INT#LTR/unknown LTR Retroviridae gammaretroviridae yes + GAG|gammaretroviridae AP|gammaretroviridae RT|gammaretroviridae RNaseH|gammaretroviridae INT|gammaretroviridae ENV|gammaretroviridae VAP|badnavirus gff3_outputs:_m29_p_5 TEsorter CDS 1266111 1267739 768.7 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|GAG_gammaretroviridae;gene=GAG;clade=gammaretroviridae;evalue=1.8e-232;coverage=99.8;probability=0.84 _m29_p_5 TEsorter CDS 1267881 1268105 91.7 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|AP_gammaretroviridae;gene=AP;clade=gammaretroviridae;evalue=3.5e-28;coverage=100.0;probability=0.88 _m29_p_5 TEsorter CDS 1268430 1269140 571.0 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|RT_gammaretroviridae;gene=RT;clade=gammaretroviridae;evalue=3.6e-173;coverage=100.0;probability=1.0 _m29_p_5 TEsorter CDS 1269741 1270163 261.6 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|RNaseH_gammaretroviridae;gene=RNaseH;clade=gammaretroviridae;evalue=1.2e-79;coverage=98.6;probability=0.98 _m29_p_5 TEsorter CDS 1270374 1271333 580.7 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|INT_gammaretroviridae;gene=INT;clade=gammaretroviridae;evalue=4.5e-176;coverage=99.4;probability=0.96 _m29_p_5 TEsorter CDS 1272217 1273209 483.8 + 1 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|ENV_gammaretroviridae;gene=ENV;clade=gammaretroviridae;evalue=1.2e-146;coverage=99.0;probability=0.92 _m29_p_5 TEsorter CDS 1272826 1272915 9.4 + 1 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|VAP_badnavirus;gene=VAP;clade=badnavirus;evalue=0.0066;coverage=18.0;probability=0.96
Hi, I have checked the element:
_m29_p_5 TEsorter CDS 1267878 1268072 0.28 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Ty3_gypsy:Ty3-PROT;Name=Ty3_gypsy-PROT;gene=PROT;clade=Ty3_gypsy;coverage=100.0;evalue=7.8e-06;probability=0.86
_m29_p_5 TEsorter CDS 1268367 1268999 1.24 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-RT;Name=Retrovirus-RT;gene=RT;clade=Retrovirus;coverage=100.0;evalue=4.3e-79;probability=0.98
_m29_p_5 TEsorter CDS 1269750 1270151 0.9 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-RH;Name=Retrovirus-RH;gene=RH;clade=Retrovirus;coverage=100.0;evalue=6.6e-35;probability=0.99
_m29_p_5 TEsorter CDS 1270374 1271246 1.05 + 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-INT;Name=Retrovirus-INT;gene=INT;clade=Retrovirus;coverage=100.0;evalue=1.4e-78;probability=0.96
And its initial hits with protein domains (*.domtbl) are also in positive strand:
# --- full sequence --- -------------- this domain ------------- hmm coord ali coord env coord
# target name accession tlen query name accession qlen E-value score bias # of c-Evalue i-Evalue score bias from to from to from to acc description of target
# ------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------
Class_I/LTR/Retrovirus:Retrovirus-RT - 209 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 - 2619 2.4e-79 259.4 0.0 1 1 1.3e-79 4.3e-79 258.5 0.0 1 209 962 1172 962 1172 0.98 -
Class_I/LTR/Retrovirus:Retrovirus-INT - 245 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 - 2619 8e-79 258.5 0.0 1 1 4.1e-79 1.4e-78 257.8 0.0 1 245 1631 1921 1631 1921 0.96 -
Class_I/LTR/Ty3_gypsy:Ty3-INT - 222 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 - 2619 7.3e-49 160.2 0.0 1 1 3.6e-49 1.2e-48 159.5 0.0 1 217 1631 1845 1631 1849 0.94 -
Class_I/LTR/Ty3_gypsy/non-chromovirus/non-chromo-outgroup:Ty3-INT - 300 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 - 2619 5.4e-47 154.4 0.0 1 1 8.2e-47 2.7e-46 152.1 0.0 1 300 1631 1921 1631 1921 0.91 -
Class_I/LTR/Ty3_gypsy/chromovirus/chromo-outgroup:Ty3-INT - 305 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 - 2619 2.7e-37 122.3 0.1 1 1 2.9e-37 9.6e-37 120.5 0.0 5 287 1632 1909 1629 1921 0.80 -
Class_I/LTR/Retrovirus:Retrovirus-RH - 126 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 - 2619 4e-35 114.6 0.0 1 1 2e-35 6.6e-35 113.9 0.0 1 126 1423 1556 1423 1556 0.99 -
Class_I/LTR/Ty3_gypsy/non-chromovirus/OTA/Athila:Ty3-INT - 313 _m29_p_5:1265484..1273342_INT#LTR/unknown|aa1 - 2619 7.6e-35 114.5 0.0 1 1 3.6e-35 1.2e-34 113.8 0.0 7 217 1634 1843 1629 1850 0.93 -
....
Then I make a revserse-comeplent sequence:
>_m29_p_5:1265484..1273342_INT#LTR/unknown
tttcccccctgtgtgttgttcataatgaggaatcttgctcatgtacgggccagccgtagaggtcctcacgggtgggcactgtctcatactgttgtcttaggaccatgagctggacagtattaacacgatcttttacaaaattaataagcttgtttaatatgcaagggcccaaggttagaataagcacaagcaagataattgggcccactaaggtagatactaaggtggtgagccaaggagatctgttaaaccaagcctcaaaccagttctcctgagcttccctttcccgttttctcttagctaacccttctctcacctttgccatggattcttttaccacaccagtatggtcagcataaaaacaacattcctcccccagcgcggcacacagtcccccttgttggaggaacaacaaatcaagtcctcttctattctggagtactacctcggatagagaggtgagcgacttctctaaatgactaatcgaggtttccaacctttctatgtcctcatctatggccgcccttagggaggtcattcctgattgttgagtagccagggaagctatgccggtcccagctccggctattcccaaactgaacagggtggcaatggttagtgcggtgatgggctctcttttacttctcatacttgtgtcactatcccaatgcgaatacatactctcctcagggtgatagaggatgcggggcagcactgttactaggacacagaattcattggcggcattaaagaccgaggtagataagcacggggtgaggccagtctttgaacatatccaccatccatcagttctggggattaaccacttagtgtcacttttccaactaggggagctgtctatggaggcacataaactttgtttagcctggggcaccttccctagacaggtcccattgccgctcactagttgcatggttaagccaattttacgatttccccatgaacactgagaagggttcttaccgttagaggcgttgtaagtggcattaagtcctattgcctcatagaacgggggctttatatcatagcacagccaacaggaggttgtgaggttaggactggtggcattaagggtctcgtatacagtgcgcaccagtttccgcaatgagtctttggttggCTGCGTAGCAGGCGTCAAAGGAGTTACAGAGGTCTTTGGCTGGGTTCCCGCAGTCCCTCCTGCGGTGTTTTTATCCCTTGATATACCTGGGTTCTTGGTAGGGACGAGAGGGGCCAGCACCTTATTTGGACCCACCTGAGTGCTGATCATTTCCACCGATAGTCGAATAGTTAGGAGACCGCCGGGGTGGGGACCTATCCATGCCCAACGGCCTATATCTAATTGGAACCCCCAAGTTAATCCTGATAACCAACCACGCTCTCTTCGCGCCACGTCTTGGTTAAACTGGACACGTACCTGGGGCACCCGGTTGTGGGGGtccctaaaggagaatttaactagatccctgttcccaacgtcccactgtcggggcccatcataggaggtgacacaactccaactaccacaatagtagcggtctgggccgccacaggttttctaattgtttcttaggttccctgggcaggcccaaaacccttgtgcactatgtccttgagaggtgtcaattaccgcccgcttggacctgactgagtagtcatactgccgtccacgtttagtgccgaaaatgtcacgcaggtcgaaaaacaagtctggccaccacgtattgatgggggcagtatgtgtggtgctattaagggttgtttgggtctgtccatctgttagggtccatgttagcttatggggttggtgtgagttgatccccgcgtggctcttctcccagatgttgagcagagttagggctagcagccattccatcgttagctgaggcagggggcttgacgcttccccgaggtcgggagagctgcagcttcagagggttatcagggtgtcgccgtacgatccactccttagcttgcgtcttctccagctggctggctcggcgcacgtgagagtgatggacccaaggcccaatgccgtcaacctttaaggcagtgggggtaaccagaataaccacataaggacctttccatctttcctccagtgtccgggacctgtgtctccttacccatacccaatcccctggaacgatgccatgttccgggttcggggcgtccttaatttcatacagggaactcactaggggccatatctcatgttggaccccttgtagggcctttaaactggccagataacttggggccacattggggtcatgatctggtagagtacgaacaataatggggggtggtgccccatacagaatttcgaaaggtgtcaaaccatgtacatatggtgagttccggacccggaagatggcatagggtaagagggtcacccagtccccgccagtctcgatggctagtttggacaaggtctcctttagagtccgattcattctctctacctgccctgagctctggggattatattcacaatgtaacttccaattgatccctatcgctcgggctagtccctgtaggacgttactgatgaaagctgggccgttatcggagcctaaaacctcaggaaccccatatctgggaataatttcttctagtaatgccttagcaaccacttgggcagtctcccgtttcgtggggaaggcttccacccagcccgaaaatgtgtcaaccattactagcaagtacttatacccatacctcccaggctttacctcagtaaaatccacttcccaactccgtcccggcgctcttccccgtaccctcgtacctgtatgttggggtcctttcctactgggtctcatagcctggcacccaatgcactgatctacaatctcttgaatctgagctgcttgtcggggaaaccggaggcgggcggactcgagaattgtcagcaacttcttttttcctaagtgggtggcttgatgcaggttggagagaagaaacagtcctagctgtgccggcagtatcaatcttccttctgtatcccgatgccacccctgctgatcagattccgggcagtggtggttctggatccatcgcaggtcttctggagtgtagtcaggtcgcgggggcaggcgagggagctcaggtgtgggcagggtgagtgctaaagctgatgaagctactgccactgccttggcggcttcatccgctcgccggtttccttcagcttccggggtctgggcagactggtgcccagggatgtggacaactgcgactgcccggggcatttgcacagccatcagcagtcttcgtacctcaggaagattgcgcagagtctttccttccgctgtaacaaagcctctttcccggtagatagcgccatgcacatggacagtgccaaaggcgtagcggctatcggtgtagacagtcactcgtctcccttcggcccgttccagcgcctccgccagcgcgatcagttcggccttctgtgctgatgtccccggggaaagcgaggcattccaaatgatgtttcccccttggtctaccaccgctgcgcctgccctccgcacaccatctataacgaagctgcttccatcagtgtaccataccaactcactgttgggtagtgcggtgtcctggaggtcggggcgcacctgggtgacttctgccatgatctcttggcagtcatgcaggggagctctcagatccggggtcggcagcagggtggctggattcagagcggtgggttcagcgaagatgatccggggtgcatctagcaagagtccttggtaatgtgttagtcgggcattagtcatccacctaccaggaggatatttcaggaccccctcgatcgcatggggggttactaccttcagatgttgcccaaaagtgagtttatcagcatccttcaccattagggctactgccgcaatgatccttaagcacgggggccatcctgctgcaactggatctagcttcttggataaataggcaaccgggcgtttccagggccccagacgctgcattagcacccctttcgctattcccctcctctcatcaacaaagagagtgaagggcttcagggggtctggcaatgccagagccggggctcttaggagagcgaccttgagttcatcgtaggccttctgttggtctgacccccaggcccaagggaccttatccttggttgcctcatacagaggttttgctatttcagcataccccaaaatccacagccggcagtagcctgtcgtccctaaaaactcacggacctctcgtgctgaggtcgggactggaagtctaagaatagtctctttcatggcctctgtcagccatctggctccttcttttagtttataccccaggtaggtgactgtttgcctgcatatttgagccttctttgcactggcccgatagcccaactgccccagctcctggaggaggtctccagtggcctgtcggcattcagcttcggagggggctgccagaagcaagtcatctacgtactgcaggagcgtaactgaattatggctctggcgaaacgagtccaaatcctgatttagggcttcattaaacagagttggagagtttttgaagccttgcggtagtctagtccaggtcagctgcccgggggttcccgtattgccatcattccattcgaaggcaaaaatgtgttggctgctgggtgccagggctatgctaaaaaacgcatcctttaggtctaaggtagtataccagacatgtgaagggggcaagtgacttagtaaggtataagggttggggaccgtgggatggatgtcttcaaccctcttatttacttccctcaagtcctggactggcctataatcttttccccccggtttcttaacggggagaagtggggtgttccaggcagaatggcaaggtttcagtattccagcttccagtaaacggttaatgtgcggggcaatccctttccgcgcctctgcagacatggggtactggcggatccggataggctgggctgaggctttaagttccaccactactggtgctcggcgggccgcccggcccacacccgctatttccgcccacgcctgagggtatgttttaagccaataatccatatcacggggccattctgtagagggagggttgtaggggttgtcctgcagggcgaacaggcgatgttcatccacaagagacagggtcaaaatgtggaggggctgtccttggccatccaatagcttaatgccatccggctcaaaatggatctgagccctgatcttagtcaggagatcgcgccccagtaagggggcagggcattcagggataactaggaaggagtgggtcacttggtggcgacctaagtctacttggcgcctactagtccaccgataagccttggacccagttgccccttgcaccaaactggttttctgagataagggctctgtgggcttattcaaaactgagtactgggctcctgtgtctaccaggaatcctactggcttcccctccacatacgcagttacccaagactcggggaggggatccgagtcccgtctcccctagtcactctccatccccgccagcaggacccgtgcgtcttgccctgtttggccctggcgcttggggcactccctcttccaatgtccatactctttgcagtttgcacactgtcccctatccagtcggggccgcggtctccacggtcgggctggtccggccggactcggtcccaccctgactgtgctttgcacgcctgccaacaagatcttggccatctccctctgctgcctcctgttctccctactctgatgctctctgtcttccttcctgattcgtttctgtaattcctgattttcttttctaattctatcctccctttcttcgggagtctctcgagtgttgaagactctctccgctactttcattaaatcccggatagacatttctcccagtccctcctgtttgtacaatttcttcctaatatctggggcagcctggtttataaaggacataattacagccgactggttttcctctgccagggggtccaacggggtgtactgtctgtaagcatcatagaggcgttctaaaaacacggccgggctttcattatccccttgcattatagcttttaccttggccaaattggtggggcggcgtgccgccgctcggagacctgccataagagtctggcggtagactcggagacgctccctaccttctgcgttcccaaagtcccaatccggtctattcaggggaaagcgctcatctatcaggttcggcaaggtcgtgggtcttccgttgtcgccggggacattttttctggcctcggtgaggattcgctcgcgctcctcggtggtgaataaggtcttaagaagctgctggcaatcatcccaagtgggactgtgtgtgtgcatgacagactcgaacaggtcagttaggcctttcgggtcctcagaaaaaggagggttttgagccttccagttgtacagatcactgctagaaaaaggccagtactggtatgctcgctccccatctggaccagttcctcctagtgctcgcacggggagaatcggcgccccagcggaggtggaggaggacggttcctcctctggggtctcttcccggcgtctgcgaggcctcaatcccctcgctggcccttgagctggggcaggggaacccgggggagggggagccattggggaggcggtagagtgagacagctcggaaggagcggcagcctcaggcgggagcggcgccatcgggctgggcgcgcgccgcggctggagcggcgccaccggagcgtaaggagggggggtctcttccagatctaggtctatcagggaggggtatatgtctgacccctcctgaaggatgggtccctgggtcggcttctccgggggtttcgggccggccgtgggcactgccgactggcgggagggtcccgaaacggtcaggactttaagggggagggggggatcttctggcttgtcagggataaagggcttaagccaggagggaggactctctactaaggcttgccacatcaaaatatagggatattggtccggatggcgccgattaataatatctcggaccttcttaataatgtctagggaaaaagttccctgggggggccagcccacattaaaagtaggccattctgcagagcagaatgtatcaaacttacctttcttcacttccacaccatgattacgagccttggcgcggatttcaggaaagtggttcaggagcagggtcttaggtgttacctgaacctgtcccataattgtcacaaagagaaaccaagaaaaggcaaaagaaaggacaaaagacacagtgccagcaaatacacaacttcgcacaggactcttcaacacccaccggccggtcaaccacaccacatccacaggcgccgtttcaatcacaccagtctcaccacgctcaagatccttacctagggcccgtccaaacggcgtccactgtggacgtcgctgggccaccttctcgtcggggacgtctcccacgacttcaagtaacgaagcctccagggtcgtaacccgcactttccttcccgtgagaattctcaactgggaccgggcagAGACCTGTTTCAGTCTCTCCGGTCGAGGACCTGTTTCAGTCCTCCCCTGTTTGGGACCGGGCAGAGACCTGTTTCAGTCTCTCCGGTCGAGGACCTGTTTCAGTCCTCCCCTGTTTGGGACCGGGCAGAGACCTGTTTCAGTCTCTCCGGTCGAGGACCTGTTTCAGTCCTCCCCTATTGGAGGTGGCCAAACCTCCTTCCGCGGTTCCCTATGTAAACCTCGGTATCGGGAGTTGTCTGTTCCCCTGAGGGGGGGCATCCCGGACGAGCCcccaaa
Now it is the negative strand:
_m29_p_5 TEsorter CDS 1267580 1268452 1.05 - 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-INT;Name=Retrovirus-INT;gene=INT;clade=Retrovirus;coverage=100.0;evalue=1.4e-78;probability=0.96
_m29_p_5 TEsorter CDS 1268675 1269076 0.9 - 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-RH;Name=Retrovirus-RH;gene=RH;clade=Retrovirus;coverage=100.0;evalue=6.6e-35;probability=0.99
_m29_p_5 TEsorter CDS 1269827 1270459 1.24 - 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Retrovirus:Retrovirus-RT;Name=Retrovirus-RT;gene=RT;clade=Retrovirus;coverage=100.0;evalue=4.3e-79;probability=0.98
_m29_p_5 TEsorter CDS 1270754 1270948 0.28 - 0 ID=_m29_p_5:1265484..1273342_INT#LTR/unknown|Class_I/LTR/Ty3_gypsy:Ty3-PROT;Name=Ty3_gypsy-PROT;gene=PROT;clade=Ty3_gypsy;coverage=100.0;evalue=7.8e-06;probability=0.86
And *.domtbl hits to rev_aa
:
# --- full sequence --- -------------- this domain ------------- hmm coord ali coord env coord
# target name accession tlen query name accession qlen E-value score bias # of c-Evalue i-Evalue score bias from to from to from to acc description of target
#------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------
Class_I/LTR/Retrovirus:Retrovirus-RT - 209 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 - 2619 2.4e-79 259.4 0.0 1 1 1.3e-79 4.3e-79 258.5 0.0 1 209 962 1172 962 1172 0.98 -
Class_I/LTR/Retrovirus:Retrovirus-INT - 245 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 - 2619 8e-79 258.5 0.0 1 1 4.1e-79 1.4e-78 257.8 0.0 1 245 1631 1921 1631 1921 0.96 -
Class_I/LTR/Ty3_gypsy:Ty3-INT - 222 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 - 2619 7.3e-49 160.2 0.0 1 1 3.6e-49 1.2e-48 159.5 0.0 1 217 1631 1845 1631 1849 0.94 -
Class_I/LTR/Ty3_gypsy/non-chromovirus/non-chromo-outgroup:Ty3-INT - 300 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 - 2619 5.4e-47 154.4 0.0 1 1 8.2e-47 2.7e-46 152.1 0.0 1 300 1631 1921 1631 1921 0.91 -
Class_I/LTR/Ty3_gypsy/chromovirus/chromo-outgroup:Ty3-INT - 305 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 - 2619 2.7e-37 122.3 0.1 1 1 2.9e-37 9.6e-37 120.5 0.0 5 287 1632 1909 1629 1921 0.80 -
Class_I/LTR/Retrovirus:Retrovirus-RH - 126 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 - 2619 4e-35 114.6 0.0 1 1 2e-35 6.6e-35 113.9 0.0 1 126 1423 1556 1423 1556 0.99 -
Class_I/LTR/Ty3_gypsy/non-chromovirus/OTA/Athila:Ty3-INT - 313 _m29_p_5:1265484..1273342_INT#LTR/unknown|rev_aa1 - 2619 7.6e-35 114.5 0.0 1 1 3.6e-35 1.2e-34 113.8 0.0 7 217 1634 1843 1629 1850 0.93 -
So I think the seqence you provided indeed is in positive strand. Do you have a check whether LTR_retriever
has reversed it?
Or you can try extract the LTR sequences from LTR_retriever results by using our method in https://github.com/zhangrengang/TEsorter#extracting-te-sequences-from-genome-for-tesorter.
wow, I followed your advice and yes! LTR_retriever did reverse this sequence. That's quite confusing, this problem has been stuck with me for a long time. I think I'll try your method to see if there could be any improvement. Thx a lot.
Hi there! @zhangrengang I checked my TEsorter results and found the .dom.gff3 results file only contain LTRs from positive strain. My import file is from LTR_retriever with .LTRlib.fa, and I am very sure that it contains negative strain LTRs. And it is certain that there are some high score LTRs from negative stran which should be identified. Did I miss some arguments which cause this problem? Yours sincerely.