And I tested it with the following command “perl... /EDTA.pl --genome genome.fa --cds genome.cds.fa --curatedlib rice7.0.0. Liban --exclude genome.exclude.bed -- overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 10 ”. But the following warning was in the output log:
"Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty.",
"Warning: The SINE result file has 0 bp!",
" Warning:The LINE result file has 0 bp!",
"Error encountered: [Errno 2] No such file or directory: 'bedtools'
mv: cannot stat 'chromosome_density_plots.pdf': No such file or directory",
"cp: cannot stat 'genome.fa.mod.EDTA.TEanno.density_plots.pdf': No such file or directory".
I don't know whether there is a dependency failed to be installed successfully or the data itself does not have new LINEs/SINEs. The following is my log file, may I ask if this is the successful installation?
2024年 04月 17日 星期三 22:57:04 CST Dependency checking:
All passed!
A custom library rice7.0.0.liban is provided via --curatedlib. Please make sure this is a manually curated library but not machine generated.
A CDS file genome.cds.fa is provided via --cds. Please make sure this is the DNA sequence of coding regions only.
A BED file is provided via --exclude. Regions specified by this file will be excluded from TE annotation and masking.
2024年 04月 17日 星期三 22:57:08 CST Obtain raw TE libraries using various structure-based programs:
2024年 04月 17日 星期三 22:57:08 CST EDTA_raw: Check dependencies, prepare working directories.
2024年 04月 17日 星期三 23:01:22 CST Execution of EDTA_raw.pl is finished!
2024年 04月 17日 星期三 23:01:22 CST Obtain raw TE libraries finished.
All intact TEs found by EDTA:
genome.fa.mod.EDTA.intact.raw.fa
genome.fa.mod.EDTA.intact.raw.gff3
2024年 04月 17日 星期三 23:01:22 CST Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:
Warning: No sequences were masked
2024年 04月 17日 星期三 23:01:40 CST EDTA advance filtering finished.
2024年 04月 17日 星期三 23:01:40 CST Perform EDTA final steps to generate a non-redundant comprehensive TE library.
Filter RepeatModeler results that are ignored in the raw step.
2024年 04月 17日 星期三 23:01:45 CST Clean up TE-related sequences in the CDS file with TEsorter.
Remove CDS-related sequences in the EDTA library.
Remove CDS-related sequences in intact TEs.
2024年 04月 17日 星期三 23:01:52 CST Combine the high-quality TE library rice7.0.0.liban with the EDTA library:
2024年 04月 17日 星期三 23:01:59 CST EDTA final stage finished! You may check out:
The final EDTA TE library: genome.fa.mod.EDTA.TElib.fa
Family names of intact TEs have been updated by rice7.0.0.liban: genome.fa.mod.EDTA.intact.gff3
Comparing to the provided library, EDTA found these novel TEs: genome.fa.mod.EDTA.TElib.novel.fa
The provided library has been incorporated into the final library: genome.fa.mod.EDTA.TElib.fa
2024年 04月 17日 星期三 23:01:59 CST Homology-based annotation of TEs using genome.fa.mod.EDTA.TElib.fa from scratch.
Error encountered: [Errno 2] No such file or directory: 'bedtools'
mv: cannot stat 'chromosome_density_plots.pdf': No such file or directory
2024年 04月 17日 星期三 23:02:10 CST TE annotation using the EDTA library has finished! Check out:
Whole-genome TE annotation (total TE: 34.61%): genome.fa.mod.EDTA.TEanno.gff3
Whole-genome TE annotation summary: genome.fa.mod.EDTA.TEanno.sum
Whole-genome TE divergence plot: genome.fa.mod_divergence_plot.pdf
Whole-genome TE density plot: genome.fa.mod.EDTA.TEanno.density_plots.pdf
Low-threshold TE masking for MAKER gene annotation (masked: 17.27%): genome.fa.mod.MAKER.masked
cp: cannot stat 'genome.fa.mod.EDTA.TEanno.density_plots.pdf': No such file or directory
2024年 04月 17日 星期三 23:02:10 CST Evaluate the level of inconsistency for whole-genome TE annotation:
2024年 04月 17日 星期三 23:02:12 CST Evaluation of TE annotation finished! Check out these files:
Overall: genome.fa.mod.EDTA.TE.fa.stat.all.sum
Nested: genome.fa.mod.EDTA.TE.fa.stat.nested.sum
Non-nested: genome.fa.mod.EDTA.TE.fa.stat.redun.sum
If you want to learn more about the formatting and information of these files, please visit:
https://github.com/oushujun/EDTA/wiki/Making-sense-of-EDTA-usage-and-outputs---Q&A
The file "genome.fa.mod.EDTA.TEanno.sum" is as follow, did I run it successfully?
Dr. Shujun,
Hi! I installed EDTA v2.2.1 by ran the commands "git clone https://github.com/oushujun/EDTA.git" and "mamba env create -f EDTA_2.2.x.yml".
And I tested it with the following command “perl... /EDTA.pl --genome genome.fa --cds genome.cds.fa --curatedlib rice7.0.0. Liban --exclude genome.exclude.bed -- overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 10 ”. But the following warning was in the output log: "Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty.", "Warning: The SINE result file has 0 bp!", " Warning:The LINE result file has 0 bp!", "Error encountered: [Errno 2] No such file or directory: 'bedtools' mv: cannot stat 'chromosome_density_plots.pdf': No such file or directory", "cp: cannot stat 'genome.fa.mod.EDTA.TEanno.density_plots.pdf': No such file or directory".
I don't know whether there is a dependency failed to be installed successfully or the data itself does not have new LINEs/SINEs. The following is my log file, may I ask if this is the successful installation?
#########################################################
Extensive de-novo TE Annotator (EDTA) v2.2.1
Shujun Ou (shujun.ou.1@gmail.com)
#########################################################
Parameters: --genome genome.fa --cds genome.cds.fa --curatedlib rice7.0.0.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 10
2024年 04月 17日 星期三 22:57:04 CST Dependency checking: All passed!
2024年 04月 17日 星期三 22:57:08 CST Obtain raw TE libraries using various structure-based programs: 2024年 04月 17日 星期三 22:57:08 CST EDTA_raw: Check dependencies, prepare working directories.
2024年 04月 17日 星期三 22:57:09 CST Start to find LTR candidates.
2024年 04月 17日 星期三 22:57:09 CST Identify LTR retrotransposon candidates from scratch.
Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty. 2024年 04月 17日 星期三 22:57:33 CST Finish finding LTR candidates.
2024年 04月 17日 星期三 22:57:33 CST Start to find SINE candidates.
2024年 04月 17日 星期三 22:58:14 CST Warning: The SINE result file has 0 bp!
2024年 04月 17日 星期三 22:58:14 CST Start to find LINE candidates.
2024年 04月 17日 星期三 22:58:14 CST Identify LINE retrotransposon candidates from scratch.
2024年 04月 17日 星期三 22:59:56 CST Warning: The LINE result file has 0 bp!
2024年 04月 17日 星期三 22:59:56 CST Start to find TIR candidates.
2024年 04月 17日 星期三 22:59:56 CST Identify TIR candidates from scratch.
Species: others 2024年 04月 17日 星期三 23:00:47 CST Finish finding TIR candidates.
2024年 04月 17日 星期三 23:00:47 CST Start to find Helitron candidates.
2024年 04月 17日 星期三 23:00:47 CST Identify Helitron candidates from scratch.
2024年 04月 17日 星期三 23:01:22 CST Finish finding Helitron candidates.
2024年 04月 17日 星期三 23:01:22 CST Execution of EDTA_raw.pl is finished!
2024年 04月 17日 星期三 23:01:22 CST Obtain raw TE libraries finished. All intact TEs found by EDTA: genome.fa.mod.EDTA.intact.raw.fa genome.fa.mod.EDTA.intact.raw.gff3
2024年 04月 17日 星期三 23:01:22 CST Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:
Warning: No sequences were masked 2024年 04月 17日 星期三 23:01:40 CST EDTA advance filtering finished.
2024年 04月 17日 星期三 23:01:40 CST Perform EDTA final steps to generate a non-redundant comprehensive TE library.
2024年 04月 17日 星期三 23:01:45 CST Clean up TE-related sequences in the CDS file with TEsorter.
2024年 04月 17日 星期三 23:01:52 CST Combine the high-quality TE library rice7.0.0.liban with the EDTA library:
2024年 04月 17日 星期三 23:01:59 CST EDTA final stage finished! You may check out: The final EDTA TE library: genome.fa.mod.EDTA.TElib.fa Family names of intact TEs have been updated by rice7.0.0.liban: genome.fa.mod.EDTA.intact.gff3 Comparing to the provided library, EDTA found these novel TEs: genome.fa.mod.EDTA.TElib.novel.fa The provided library has been incorporated into the final library: genome.fa.mod.EDTA.TElib.fa
2024年 04月 17日 星期三 23:01:59 CST Perform post-EDTA analysis for whole-genome annotation:
2024年 04月 17日 星期三 23:01:59 CST Homology-based annotation of TEs using genome.fa.mod.EDTA.TElib.fa from scratch.
Error encountered: [Errno 2] No such file or directory: 'bedtools' mv: cannot stat 'chromosome_density_plots.pdf': No such file or directory 2024年 04月 17日 星期三 23:02:10 CST TE annotation using the EDTA library has finished! Check out: Whole-genome TE annotation (total TE: 34.61%): genome.fa.mod.EDTA.TEanno.gff3 Whole-genome TE annotation summary: genome.fa.mod.EDTA.TEanno.sum Whole-genome TE divergence plot: genome.fa.mod_divergence_plot.pdf Whole-genome TE density plot: genome.fa.mod.EDTA.TEanno.density_plots.pdf Low-threshold TE masking for MAKER gene annotation (masked: 17.27%): genome.fa.mod.MAKER.masked
cp: cannot stat 'genome.fa.mod.EDTA.TEanno.density_plots.pdf': No such file or directory 2024年 04月 17日 星期三 23:02:10 CST Evaluate the level of inconsistency for whole-genome TE annotation:
2024年 04月 17日 星期三 23:02:12 CST Evaluation of TE annotation finished! Check out these files:
The file "genome.fa.mod.EDTA.TEanno.sum" is as follow, did I run it successfully?
$ cat genome.fa.mod.EDTA.TEanno.sum Repeat Classes
Total Sequences: 1 Total Length: 1000000 bp Class Count bpMasked %masked ===== ===== ======== ======= LINE -- -- --
unknown 39 13979 1.40% LTR -- -- --
Copia 11 18647 1.86% Gypsy 48 108654 10.87% TRIM 1 129 0.01% unknown 1 248 0.02% SINE -- -- --
unknown 11 1775 0.18% TIR -- -- --
CACTA 23 22722 2.27% Mutator 115 47072 4.71% PIF_Harbinger 110 28045 2.80% PILE 4 1033 0.10% POLE 2 506 0.05% Tc1_Mariner 124 48718 4.87% hAT 35 13953 1.40% unknown 9 1433 0.14% nonTIR -- -- --
helitron 56 39164 3.92%
Total 589 346078 34.61%