oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

LINE and SINE results files has 0 bp! #456

Open XuanZhang-Black opened 2 months ago

XuanZhang-Black commented 2 months ago

Dr. Shujun,

Hi! I installed EDTA v2.2.1 by ran the commands "git clone https://github.com/oushujun/EDTA.git" and "mamba env create -f EDTA_2.2.x.yml".

And I tested it with the following command “perl... /EDTA.pl --genome genome.fa --cds genome.cds.fa --curatedlib rice7.0.0. Liban --exclude genome.exclude.bed -- overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 10 ”. But the following warning was in the output log: "Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty.", "Warning: The SINE result file has 0 bp!", " Warning:The LINE result file has 0 bp!", "Error encountered: [Errno 2] No such file or directory: 'bedtools' mv: cannot stat 'chromosome_density_plots.pdf': No such file or directory", "cp: cannot stat 'genome.fa.mod.EDTA.TEanno.density_plots.pdf': No such file or directory".

I don't know whether there is a dependency failed to be installed successfully or the data itself does not have new LINEs/SINEs. The following is my log file, may I ask if this is the successful installation?

#########################################################

Extensive de-novo TE Annotator (EDTA) v2.2.1
Shujun Ou (shujun.ou.1@gmail.com)

#########################################################

Parameters: --genome genome.fa --cds genome.cds.fa --curatedlib rice7.0.0.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 10

2024年 04月 17日 星期三 22:57:04 CST Dependency checking: All passed!

A custom library rice7.0.0.liban is provided via --curatedlib. Please make sure this is a manually curated library but not machine generated.

A CDS file genome.cds.fa is provided via --cds. Please make sure this is the DNA sequence of coding regions only.

A BED file is provided via --exclude. Regions specified by this file will be excluded from TE annotation and masking.

2024年 04月 17日 星期三 22:57:08 CST Obtain raw TE libraries using various structure-based programs: 2024年 04月 17日 星期三 22:57:08 CST EDTA_raw: Check dependencies, prepare working directories.

2024年 04月 17日 星期三 22:57:09 CST Start to find LTR candidates.

2024年 04月 17日 星期三 22:57:09 CST Identify LTR retrotransposon candidates from scratch.

Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty. 2024年 04月 17日 星期三 22:57:33 CST Finish finding LTR candidates.

2024年 04月 17日 星期三 22:57:33 CST Start to find SINE candidates.

2024年 04月 17日 星期三 22:58:14 CST Warning: The SINE result file has 0 bp!

2024年 04月 17日 星期三 22:58:14 CST Start to find LINE candidates.

2024年 04月 17日 星期三 22:58:14 CST Identify LINE retrotransposon candidates from scratch.

2024年 04月 17日 星期三 22:59:56 CST Warning: The LINE result file has 0 bp!

2024年 04月 17日 星期三 22:59:56 CST Start to find TIR candidates.

2024年 04月 17日 星期三 22:59:56 CST Identify TIR candidates from scratch.

Species: others 2024年 04月 17日 星期三 23:00:47 CST Finish finding TIR candidates.

2024年 04月 17日 星期三 23:00:47 CST Start to find Helitron candidates.

2024年 04月 17日 星期三 23:00:47 CST Identify Helitron candidates from scratch.

2024年 04月 17日 星期三 23:01:22 CST Finish finding Helitron candidates.

2024年 04月 17日 星期三 23:01:22 CST Execution of EDTA_raw.pl is finished!

2024年 04月 17日 星期三 23:01:22 CST Obtain raw TE libraries finished. All intact TEs found by EDTA: genome.fa.mod.EDTA.intact.raw.fa genome.fa.mod.EDTA.intact.raw.gff3

2024年 04月 17日 星期三 23:01:22 CST Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:

Warning: No sequences were masked 2024年 04月 17日 星期三 23:01:40 CST EDTA advance filtering finished.

2024年 04月 17日 星期三 23:01:40 CST Perform EDTA final steps to generate a non-redundant comprehensive TE library.

            Filter RepeatModeler results that are ignored in the raw step.

2024年 04月 17日 星期三 23:01:45 CST Clean up TE-related sequences in the CDS file with TEsorter.

            Remove CDS-related sequences in the EDTA library.

            Remove CDS-related sequences in intact TEs.

2024年 04月 17日 星期三 23:01:52 CST Combine the high-quality TE library rice7.0.0.liban with the EDTA library:

2024年 04月 17日 星期三 23:01:59 CST EDTA final stage finished! You may check out: The final EDTA TE library: genome.fa.mod.EDTA.TElib.fa Family names of intact TEs have been updated by rice7.0.0.liban: genome.fa.mod.EDTA.intact.gff3 Comparing to the provided library, EDTA found these novel TEs: genome.fa.mod.EDTA.TElib.novel.fa The provided library has been incorporated into the final library: genome.fa.mod.EDTA.TElib.fa

2024年 04月 17日 星期三 23:01:59 CST Perform post-EDTA analysis for whole-genome annotation:

2024年 04月 17日 星期三 23:01:59 CST Homology-based annotation of TEs using genome.fa.mod.EDTA.TElib.fa from scratch.

Error encountered: [Errno 2] No such file or directory: 'bedtools' mv: cannot stat 'chromosome_density_plots.pdf': No such file or directory 2024年 04月 17日 星期三 23:02:10 CST TE annotation using the EDTA library has finished! Check out: Whole-genome TE annotation (total TE: 34.61%): genome.fa.mod.EDTA.TEanno.gff3 Whole-genome TE annotation summary: genome.fa.mod.EDTA.TEanno.sum Whole-genome TE divergence plot: genome.fa.mod_divergence_plot.pdf Whole-genome TE density plot: genome.fa.mod.EDTA.TEanno.density_plots.pdf Low-threshold TE masking for MAKER gene annotation (masked: 17.27%): genome.fa.mod.MAKER.masked

cp: cannot stat 'genome.fa.mod.EDTA.TEanno.density_plots.pdf': No such file or directory 2024年 04月 17日 星期三 23:02:10 CST Evaluate the level of inconsistency for whole-genome TE annotation:

2024年 04月 17日 星期三 23:02:12 CST Evaluation of TE annotation finished! Check out these files:

            Overall: genome.fa.mod.EDTA.TE.fa.stat.all.sum
            Nested: genome.fa.mod.EDTA.TE.fa.stat.nested.sum
            Non-nested: genome.fa.mod.EDTA.TE.fa.stat.redun.sum

            If you want to learn more about the formatting and information of these files, please visit:
                https://github.com/oushujun/EDTA/wiki/Making-sense-of-EDTA-usage-and-outputs---Q&A

The file "genome.fa.mod.EDTA.TEanno.sum" is as follow, did I run it successfully?

$ cat genome.fa.mod.EDTA.TEanno.sum Repeat Classes

Total Sequences: 1 Total Length: 1000000 bp Class Count bpMasked %masked ===== ===== ======== ======= LINE -- -- --
unknown 39 13979 1.40% LTR -- -- --
Copia 11 18647 1.86% Gypsy 48 108654 10.87% TRIM 1 129 0.01% unknown 1 248 0.02% SINE -- -- --
unknown 11 1775 0.18% TIR -- -- --
CACTA 23 22722 2.27% Mutator 115 47072 4.71% PIF_Harbinger 110 28045 2.80% PILE 4 1033 0.10% POLE 2 506 0.05% Tc1_Mariner 124 48718 4.87% hAT 35 13953 1.40% unknown 9 1433 0.14% nonTIR -- -- --
helitron 56 39164 3.92%

total interspersed 589          346078       34.61%

Total 589 346078 34.61%