oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
176 stars 40 forks source link

questions about solo LTR #99

Closed xiaoyezao closed 2 years ago

xiaoyezao commented 3 years ago

Hi Shujun,

I used solo_finder.pl to generate solo_list, and I have few questions about the results:

ctg000760:6285029..6288150_LTR ctg000010:31578..34122

  1. Is the first column the soloLTR? what is the second column?
  2. In the first column, there are hundred thousands elements, but only a few thousand are unique. Does that mean there are only a few thousand soloLTR?

Also questions about the intact_list generated by intact_finder_coarse.pl:

  1. Why most LTRs in this list are replicates?
  2. what's the difference between this list and .pass.list generated by ltr_retriever?

I want to study the impact of LTR insertion on protein-coding genes (how many and which genes are related/near to LTR), so I should include as many LTRs as possible in the analysis. But I am really confused about the LTR_Retriever results? I hope to get some points from you.

Bests,

Carl

oushujun commented 3 years ago

Hi Carl,

The first column is the LTR family and second is the genomic coordinate. So You may see duplicates in the first column. This is the same format for intact LTRs. The intact finder script is more relaxed in defining intact to identify as many as possible. You may also want to use the whole genome LTR annotation (the gff3 file) for your study.

Best, Shujun

On Thu, Jun 24, 2021 at 7:12 PM xiaoyezao @.***> wrote:

Hi Shujun,

I used solo_finder.pl to generate solo_list, and I have few questions about the results:

ctg000760:6285029..6288150_LTR ctg000010:31578..34122

  1. Is the first column the soloLTR? what is the second column?
  2. In the first column, there are hundred thousands elements, but only a few thousand are unique. Does that mean there are only a few thousand soloLTR?

Also questions about the intact_list generated by intact_finder_coarse.pl:

  1. Why most LTRs in this list are replicates?
  2. what's the difference between this list and .pass.list generated by ltr_retriever?

I want to study the impact of LTR insertion on protein-coding genes (how many and which genes are related/near to LTR), so I should include as many LTRs as possible in the analysis. But I am really confused about the LTR_Retriever results? I hope to get some points from you.

Bests,

Carl

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/99, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NAAMIZTV4GDJQLWPKTTUMHK5ANCNFSM47HVKC5Q .

running111 commented 3 years ago

Hi Shujun I have a question. I used the intact LTR-RT of genome.fa.pass.list (counts: 4430) to analyze the LTR-RT insertion time. However, when I analyzed the solo/inatct ratio, I used intact_list (counts: 24990) generated by intact_finder_coarse.pl. These two files are both "intact", but the difference in number is as much as 6 times. Why is there such a big difference? What is the difference in filter conditions? The former is obtained by the default process of EDTA, and the latter is the intermediate file geneme.fa.mod.retriever.all.scn generated by EDTA as the input of LTR_retriver. After obtaining the geneme.fa.out file, run solo_finder.pl, intact_finder_coarse.pl, solo_intact_ratio.pl. Do you have any suggestions for this? Hope to hear from you soon.

Best, zhourun

oushujun commented 3 years ago

Hi Zhourun,

As suggested by the script name, intact_finder_coarse.pl find intact or near intact LTRs. They may not have all structural features but have the terminal repeat feature. The purpose is to find as many candidates as possible. These scripts only work for LTR_retriever annotation results and can not run on EDTA results. You may modify the codes to fit EDTA formats.

Best, Shujun

running111 commented 3 years ago

Hi Shujun, In fact, I re-run LTR_retriver with the combined file of LTR_FINDER and LTRharvest in EDTA, and run solo_finder.pl, intact_finder_coarse.pl, solo_intact_ratio.pl based on the output *out files. In addition, I also found that there are 6714 solo LTRs in solo_list (70834) overlapped with inatct LTR-RTs in pass.list.

Best, Shujun

oushujun commented 3 years ago

That's good to know. You should filter those out. - Shujun

On Wed, Jul 7, 2021 at 3:49 PM running111 @.***> wrote:

Hi Shujun, In fact, I re-run LTR_retriver with the combined file of LTR_FINDER and LTRharvest in EDTA, and run solo_finder.pl, intact_finder_coarse.pl, solo_intact_ratio.pl based on the output *out files. In addition, I also found that there are 6714 solo LTRs in solo_list (70834) overlapped with inatct LTR-RTs in pass.list.

Best, Shujun

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/99#issuecomment-875373377, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NDDNEHV3GOL6SN3ZTTTWQBHXANCNFSM47HVKC5Q .