oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
336 stars 73 forks source link

Main result_files were created but EDTA.pl still runned a long time #346

Closed lvqiang0120 closed 1 year ago

lvqiang0120 commented 1 year ago

hi,Professor, I used EDTA to annotate TEs for a 2.67Gb genome. Total scafolds num was 187 and the longlength Chr num were 11, Total scaffold N50 were 220MB. I had runned EDTA for 14 days, it is still running now and occupied for more than 300Gb memory. But I noticed that it did't update any result files and log files and had created .TEanno.gff3, TEanno.sum , .TElib.fa and .MAKER.masked nine days ago.

my command line : EDTA.pl \ --genome $genome_dir/$fastq --cds $genome_dir/$cds \ --overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 40 \ --repeatmasker /path/to/RepeatMasker/RepeatMasker

I had two log file (They did't update anything nine days ago): The current end of err log file: 2023-03-23 08:03:44,992 -INFO- Pipeline done. 2023-03-23 08:03:44,992 -INFO- cleaning the temporary directory ./tmp Thu Mar 23 08:35:16 CST 2023 Homology-based annotation of TEs using hap1_chr.rename.fa.mod.EDTA.TElib.fa from scratch.

The current end of out log file: Thu Mar 23 22:32:10 CST 2023 TE annotation using the EDTA library has finished! Check out: Whole-genome TE annotation (total TE: 49.90%): hap1_chr.rename.fa.mod.EDTA.TEanno.gff3 Whole-genome TE annotation summary: hap1_chr.rename.fa.mod.EDTA.TEanno.sum Low-threshold TE masking for MAKER gene annotation (masked: 16.95%): hap1_chr.rename.fa.mod.MAKER.masked

Thu Mar 23 22:32:14 CST 2023 Evaluate the level of inconsistency for whole-genome TE annotation (slow step):

Is this normal? I wanted to killed it. Besides, Is EDTA suitable for annotation of arthropod genomes? Thanks a lot and look forward to your reply](javascript:;)

oushujun commented 1 year ago

Hello,

The annotation appears complete. It's running the evaluation step which could take a while. You can read more in other issues: https://github.com/oushujun/EDTA/issues?q=evaluate

I have not tried on any arthropod genomes, it would be great if you can share some of your experiences here! To learn more about the annotation quality (consistency), you can run the evaluation step which is currently taking a long time. You can shortcut it and the way doing so should be found in other issues.

Best, Shujun

lvqiang0120 commented 1 year ago

Thanks for your response. EDTA completed the annotation successfully. The total proportion of repeat sequence was 47.59%. I think it was probably lower than it really was. The proportion of repetitive sequences might be around 60% based on previous results and closely related species. Can you give me some suggestions to adjust? I always had a question, EDTA pipeline had integrated RepeatModeler and RepeatMasker, why some users still use RepeatModeler to annotate non-LTR. Although it has been suggested that EDTA may not be very accurate in its annotation of non-LTR.

oushujun commented 1 year ago

You may want to use a generic repeat annotator to analyze the repetitiveness of your genome to gain a fair idea of the repetitiveness. If your genome has lots of non-LTRs, I suggest you collect some non-LTR sequences of this species or its sister species and give them to EDTA as a curated library. EDTA uses RepeatModeler to identify non-LTRs but will perform extra steps to filter. Neither EDTA nor RepeatModeler has a sensitive annotation of non-LTRs. Please read the wiki for more details.

Shujun