oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
176 stars 40 forks source link

DIRS-LTR and insert length question #88

Closed AlexdeMendoza closed 3 years ago

AlexdeMendoza commented 3 years ago

Dear authors, thanks in advance for maintaining this nice and very useful software. This is not an "issue" but a couple of technical questions I have about the software.

1- How does the software deal with DIRS/NGARO type of retrotransposons? Many of these have two LTR repeats, A and B, organised in this structure: LTR_A - ORFs - LTR_B - LTR_A - LTR_B. Since LTR_B are usually quite short, maybe they are discarded, but I would like to know if somehow this could be interpreted as a nested LTR and discarded by LTR_retriever. Even in some cases, Gypsy-LTRs also adopt this structure, so it is not something unique to DIRS/Ngaro. E.g. I found a Gypsy family like this in a plant genome (see https://doi.org/10.1038/s41467-018-03724-9): Screen Shot 2021-01-30 at 1 49 09 pm

2 - Any particular reason you did not include RVT_1 or RVT_3 hmms in your base protein domain collection? If I include those in my own database, would that impact negatively on the classification?

3 - What is the max insertion size that you tolerate (from LTR to LTR)? Is it is possible to modify that?

Best,

Alex

oushujun commented 3 years ago

Hi Alex,

Thanks for your inquiry. So far LTR_retriever can only identify a single LTR structure. If there is LTR B, it probably will be missed and the single LTR B inside is not recognized or removed if there is a similar region presented in the prelibrary. So far i don't know if there is any tool to identify DIRS base on structure. For such a typical structure, the task may not be as difficult. You may use a flexible LTR search engine, ie LTRharvest, then find interleaved LTRs with one shorter or with particular hmms. For special Gypsys like this, they are probably not correctly identified, unfortunately.

At the time i made the classification script, I was not particularly resourceful about HMMs. It probably will improve the classification by including these two profiles but may need some adjustments to the script. A better solution is to use a much professional classifier, TEsorter, to redo classification.

Yes, It's possible to modify max internal size, but not directly. Please check out the LTR-internal ratio parameter and make sure you also use inclusive parameters for upstream tools too.

Please let me know if you have more questions.

Best, Shujun

On Sat, Jan 30, 2021 at 9:58 PM AlexdeMendoza notifications@github.com wrote:

Dear authors, thanks in advance for maintaining this nice and very useful software. This is not an "issue" but a couple of technical questions I have about the software.

1- How does the software deal with DIRS/NGARO type of retrotransposons? Many of these have two LTR repeats, A and B, organised in this structure: LTR_A

2 - Any particular reason you did not include RVT_1 or RVT_3 hmms in your base protein domain collection? If I include those in my own database, would that impact negatively on the classification?

3 - What is the max insertion size that you tolerate (from LTR to LTR)? Is it is possible to modify that?

Best,

Alex

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/88, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NCTISDL6NX5JWHPPG3S4QF7JANCNFSM4W2KU55A .

AlexdeMendoza commented 3 years ago

Thank you very much for the information. I will have a go with TEsorter.