oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
176 stars 40 forks source link

About ltr_harvest output #117

Closed cherrie-g closed 1 year ago

cherrie-g commented 2 years ago

Hi Dr.Ou, I have got another question about the input of LTR_retriever. You give an example of how to run ltr_harvest and ltr_finder, and I run the two software as you said. I found the results of the two software were not exactly same. And the biggest difference was the last column. For example, in ltr_finder result, the file is like:

45571 49343 3773 45571 45832 262 49082 49343 262 100 0 chr1
186850 201135 14286 186850 186979 130 201006 201135 130 100 0 chr1
273833 283573 9741 273833 276318 2486 281087 283573 2487 99.8 0 chr1
443922 449152 5231 443922 444090 169 448984 449152 169 89.3 0 chr1
492639 502138 9500 492639 495017 2379 499748 502138 2391 99.1 0 chr1

The last column was the seq ID. And in ltr_harvest result, the file is like:

45571  49343  3773  45571  45832  262  49082  49343  262  100.00  0
71743  79399  7657  71743  72528  786  78629  79399  771  97.96  0
88330  98683  10354  88330  88649  320  98362  98683  322  94.41  0
147667  161663  13997  147667  153879  6213  155448  161663  6216  99.66  0
187727  201005  13279  187727  189811  2085  198919  201005  2087  99.19  0
214636  222801  8166  214636  215261  626  222180  222801  622  95.69  0

And the last column was the seq number, which may be convert to seq ID. Then we cat the two files together and run the LTR_retriever pipeline. And in get_range.pl, I saw that it would generate a file like:

:45571..49343[1]        :45571..45832
:45571..49343[2]        :49082..49343
:71743..79399[1]        :71743..72528
:71743..79399[2]        :78629..79399
:88330..98683[1]        :88330..88649
:88330..98683[2]        :98362..98683
:147667..161663[1]      :147667..153879
...
chr1:45571..49343[1]    chr1:45571..45832
chr1:45571..49343[2]    chr1:49082..49343
chr1:273833..283573[1]  chr1:273833..276318
chr1:273833..283573[2]  chr1:281087..283573
chr1:443922..449152[1]  chr1:443922..444090
chr1:443922..449152[2]  chr1:448984..449152

Obviously, the former lines haven't got the seq ID correctly. And I'm not sure if the call_seq_by_list.pl could call sequence from these information. Actually, I think it could not call LTR seq identified by ltr_harvest. So I want to know if this is a bug or I missing some steps? Do I need deal with the ltr_harvest results before cat them?

Bests.

oushujun commented 2 years ago

Hello @cherrie-g,

Sorry for the delayed response. The v2.0+ versions work fine. Can you help to check what version you are using? i.e. LTR_retriever -h

Thanks, Shujun

oushujun commented 2 years ago

@cherrie-g any luck?