Closed xiaoyezao closed 2 years ago
Hi Carl,
The first column is the LTR family and second is the genomic coordinate. So You may see duplicates in the first column. This is the same format for intact LTRs. The intact finder script is more relaxed in defining intact to identify as many as possible. You may also want to use the whole genome LTR annotation (the gff3 file) for your study.
Best, Shujun
On Thu, Jun 24, 2021 at 7:12 PM xiaoyezao @.***> wrote:
Hi Shujun,
I used solo_finder.pl to generate solo_list, and I have few questions about the results:
ctg000760:6285029..6288150_LTR ctg000010:31578..34122
- Is the first column the soloLTR? what is the second column?
- In the first column, there are hundred thousands elements, but only a few thousand are unique. Does that mean there are only a few thousand soloLTR?
Also questions about the intact_list generated by intact_finder_coarse.pl:
- Why most LTRs in this list are replicates?
- what's the difference between this list and .pass.list generated by ltr_retriever?
I want to study the impact of LTR insertion on protein-coding genes (how many and which genes are related/near to LTR), so I should include as many LTRs as possible in the analysis. But I am really confused about the LTR_Retriever results? I hope to get some points from you.
Bests,
Carl
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/99, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NAAMIZTV4GDJQLWPKTTUMHK5ANCNFSM47HVKC5Q .
Hi Shujun I have a question. I used the intact LTR-RT of genome.fa.pass.list (counts: 4430) to analyze the LTR-RT insertion time. However, when I analyzed the solo/inatct ratio, I used intact_list (counts: 24990) generated by intact_finder_coarse.pl. These two files are both "intact", but the difference in number is as much as 6 times. Why is there such a big difference? What is the difference in filter conditions? The former is obtained by the default process of EDTA, and the latter is the intermediate file geneme.fa.mod.retriever.all.scn generated by EDTA as the input of LTR_retriver. After obtaining the geneme.fa.out file, run solo_finder.pl, intact_finder_coarse.pl, solo_intact_ratio.pl. Do you have any suggestions for this? Hope to hear from you soon.
Best, zhourun
Hi Zhourun,
As suggested by the script name, intact_finder_coarse.pl
find intact or near intact LTRs. They may not have all structural features but have the terminal repeat feature. The purpose is to find as many candidates as possible. These scripts only work for LTR_retriever annotation results and can not run on EDTA results. You may modify the codes to fit EDTA formats.
Best, Shujun
Hi Shujun, In fact, I re-run LTR_retriver with the combined file of LTR_FINDER and LTRharvest in EDTA, and run solo_finder.pl, intact_finder_coarse.pl, solo_intact_ratio.pl based on the output *out files. In addition, I also found that there are 6714 solo LTRs in solo_list (70834) overlapped with inatct LTR-RTs in pass.list.
Best, Shujun
That's good to know. You should filter those out. - Shujun
On Wed, Jul 7, 2021 at 3:49 PM running111 @.***> wrote:
Hi Shujun, In fact, I re-run LTR_retriver with the combined file of LTR_FINDER and LTRharvest in EDTA, and run solo_finder.pl, intact_finder_coarse.pl, solo_intact_ratio.pl based on the output *out files. In addition, I also found that there are 6714 solo LTRs in solo_list (70834) overlapped with inatct LTR-RTs in pass.list.
Best, Shujun
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/99#issuecomment-875373377, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NDDNEHV3GOL6SN3ZTTTWQBHXANCNFSM47HVKC5Q .
Hi Shujun,
I used
solo_finder.pl
to generatesolo_list
, and I have few questions about the results:ctg000760:6285029..6288150_LTR ctg000010:31578..34122
Also questions about the
intact_list
generated byintact_finder_coarse.pl
:.pass.list
generated by ltr_retriever?I want to study the impact of LTR insertion on protein-coding genes (how many and which genes are related/near to LTR), so I should include as many LTRs as possible in the analysis. But I am really confused about the LTR_Retriever results? I hope to get some points from you.
Bests,
Carl