Closed sajjadasaf closed 3 years ago
You'd better fix your sequence IDs before further testing since you see:
Trying to reformat seq IDs... Attempt 1... Attempt 2...
The program will try to fix this but if you encounter errors, you should fix it yourself even if it said "successful." Otherwise I can't determine what could be the cause.
Shujun
I have change sequence ID to just seq1, seq2 but still the following error shown. I have checked in other genome and working properly.
(EDTA) sajjad@sajjad-ThinkStation-P910:~/Downloads/EDTA$ perl EDTA.pl --genome BCMY01.fasta --species others --step all --sensitive 1 --evaluate 1 --anno 1 --force 1 -t 12
########################################################
########################################################
Sun 04 Jul 2021 09:45:03 PM +04 Dependency checking: All passed!
Sun 04 Jul 2021 09:45:05 PM +04 Obtain raw TE libraries using various structure-based programs: Sun 04 Jul 2021 09:45:05 PM +04 EDTA_raw: Check dependencies, prepare working directories.
Sun 04 Jul 2021 09:45:07 PM +04 Start to find LTR candidates.
Sun 04 Jul 2021 09:45:07 PM +04 Identify LTR retrotransposon candidates from scratch.
awk: fatal: cannot open file `BCMY01.fasta.mod.pass.list' for reading (No such file or directory) Warning: LOC list - is empty.
Error: Error while loading sequence perl filter_gff3.pl file.gff3 file.list > new.gff3
cp: cannot stat 'BCMY01.fasta.mod.LTRlib.fa': No such file or directory cp: cannot stat 'BCMY01.fasta.mod.LTRlib.fa': No such file or directory Error: LTR results not found!
cat: BCMY01.fasta.mod.TIR.intact.fa: No such file or directory cat: BCMY01.fasta.mod.Helitron.intact.fa: No such file or directory cat: BCMY01.fasta.mod.TIR.intact.bed: No such file or directory cat: BCMY01.fasta.mod.Helitron.intact.bed: No such file or directory Sun 04 Jul 2021 09:45:20 PM +04 Obtain raw TE libraries finished. All intact TEs found by EDTA: BCMY01.fasta.mod.EDTA.intact.fa BCMY01.fasta.mod.EDTA.intact.gff3
Sun 04 Jul 2021 09:45:20 PM +04 Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:
Sun 04 Jul 2021 09:54:17 PM +04 EDTA advance filtering finished.
Sun 04 Jul 2021 09:54:17 PM +04 Perform EDTA final steps to generate a non-redundant comprehensive TE library:
Use RepeatModeler to identify any remaining TEs that are missed by structure-based methods.
2021-07-04 22:24:09,049 -WARNING- Grid computing is not available because DRMAA not configured properly: Could not find drmaa library. Please specify its full path using the environment variable DRMAA_LIBRARY_PATH
2021-07-04 22:24:09,055 -INFO- VARS: {'sequence': 'BCMY01.fasta.mod.RM.consensi.fa', 'hmm_database': 'rexdb', 'seq_type': 'nucl', 'prefix': 'BCMY01.fasta.mod.RM.consensi.fa.rexdb', 'force_write_hmmscan': False, 'processors': 12, 'tmp_dir': './tmp', 'min_coverage': 20, 'max_evalue': 0.001, 'disable_pass2': False, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'no_cleanup': False, 'p2_identity': 80.0, 'p2_coverage': 80.0, 'p2_length': 80.0}
2021-07-04 22:24:09,055 -INFO- checking dependencies:
2021-07-04 22:24:09,065 -INFO- hmmer 3.3.1 OK
2021-07-04 22:24:09,126 -INFO- blastn 2.10.0+ OK
2021-07-04 22:24:09,126 -INFO- check database rexdb
2021-07-04 22:24:09,126 -INFO- db path: /home/sajjad/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database
2021-07-04 22:24:09,126 -INFO- db file: REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm
2021-07-04 22:24:09,126 -INFO- REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm OK
2021-07-04 22:24:09,126 -INFO- Start classifying pipeline
2021-07-04 22:24:09,136 -INFO- total 33 sequences
2021-07-04 22:24:09,136 -INFO- translating BCMY01.fasta.mod.RM.consensi.fa
in six frames
/home/sajjad/.local/lib/python3.6/site-packages/Bio/Seq.py:2859: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
BiopythonWarning,
2021-07-04 22:24:09,179 -INFO- HMM scanning against /home/sajjad/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm
2021-07-04 22:24:09,183 -INFO- Creating server instance (pp-1.6.4.4)
2021-07-04 22:24:09,183 -INFO- Running on Python 3.6.12 linux
2021-07-04 22:24:10,116 -INFO- pp local server started with 12 workers
2021-07-04 22:24:10,149 -INFO- Task 0 started
2021-07-04 22:24:10,150 -INFO- Task 1 started
2021-07-04 22:24:10,150 -INFO- Task 2 started
2021-07-04 22:24:10,151 -INFO- Task 3 started
2021-07-04 22:24:10,152 -INFO- Task 4 started
2021-07-04 22:24:10,154 -INFO- Task 5 started
2021-07-04 22:24:10,155 -INFO- Task 6 started
2021-07-04 22:24:10,156 -INFO- Task 7 started
2021-07-04 22:24:10,156 -INFO- Task 8 started
2021-07-04 22:24:10,157 -INFO- Task 9 started
2021-07-04 22:24:10,157 -INFO- Task 10 started
2021-07-04 22:24:10,158 -INFO- Task 11 started
2021-07-04 22:24:10,397 -INFO- generating gene anntations
2021-07-04 22:24:10,401 -INFO- 0 sequences classified by HMM
2021-07-04 22:24:10,401 -INFO- see protein domain sequences in BCMY01.fasta.mod.RM.consensi.fa.rexdb.dom.faa
and annotation gff3 file in BCMY01.fasta.mod.RM.consensi.fa.rexdb.dom.gff3
2021-07-04 22:24:10,401 -WARNING- skipping pass-2 classification for zero classification in step-1
2021-07-04 22:24:10,401 -INFO- see classified sequences in BCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.tsv
2021-07-04 22:24:10,402 -INFO- writing library for RepeatMasker in BCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.lib
2021-07-04 22:24:10,404 -INFO- writing classified protein domains in BCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.pep
2021-07-04 22:24:10,404 -INFO- Summary of classifications:
Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains
2021-07-04 22:24:10,404 -INFO- Pipeline done.
2021-07-04 22:24:10,404 -INFO- cleaning the temporary directory ./tmp
Input file "BCMY01.fasta.mod.RepeatModeler.raw.fa.masked" not found!
Usage: perl cleanup_tandem.pl -f sample.fa [options] > sample.cln.fa
Options:
-misschar [n|l] Define the letter representing unknown sequences; default: n. l: recognize lower case letters
-Nscreen [0|1] Enable (1) or disable (0) the -nc parameter; default: 1
-nc [int] Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0
-nr [0-1] Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1
-minlen [int] Minimum sequence length filter after clean up; default: 100 (bp)
-maxlen [int] Maximum sequence length filter after clean up; default: 25000 (bp)
-cleanN [0|1] Retain (0) or remove (1) the -misschar taget in output sequence; default: 0
-cleanT [0|1] Remove entire seq. if any terminal seq (20bp) has 15bp of N (1); disabled by default (0).
-minrm [int] The minimum length of -misschar to be removed if -cleanN 1; default: 1.
-trf [0|1] Enable (1) or disable (0) tandem repeat finder (trf); default: 1
-trf_path path Path to the trf program
Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.
ERROR: Intact TE annotation not found in BCMY01.fasta.mod.EDTA.intact.gff3 at EDTA.pl line 566.
Thanks for checking. Your case is similar to #188, please update to v1.9.9 and try again. - Shujun
Thank you for your help problem solved.
(EDTA) sajjad@sajjad-ThinkStation-P910:~/Downloads/EDTA$ perl EDTA.pl --genome BCMY01.fasta --species others --step all --sensitive 1 --evaluate 1 --anno 1 --force 1 -t 12
########################################################
Extensive de-novo TE Annotator (EDTA) v1.9.6
Shujun Ou (shujun.ou.1@gmail.com)
########################################################
Sat 03 Jul 2021 10:29:17 PM +04 Dependency checking: All passed!
Sat 03 Jul 2021 10:29:19 PM +04 The longest sequence ID in the genome contains 80 characters, which is longer than the limit (15) Trying to reformat seq IDs... Attempt 1... Attempt 2... Sat 03 Jul 2021 10:29:19 PM +04 Seq ID conversion successful!
Sat 03 Jul 2021 10:29:20 PM +04 Obtain raw TE libraries using various structure-based programs: Sat 03 Jul 2021 10:29:20 PM +04 EDTA_raw: Check dependencies, prepare working directories.
Sat 03 Jul 2021 10:29:21 PM +04 Start to find LTR candidates.
Sat 03 Jul 2021 10:29:21 PM +04 Identify LTR retrotransposon candidates from scratch.
awk: fatal: cannot open file `BCMY01.fasta.mod.pass.list' for reading (No such file or directory) Warning: LOC list - is empty.
Error: Error while loading sequence perl filter_gff3.pl file.gff3 file.list > new.gff3
cp: cannot stat 'BCMY01.fasta.mod.LTRlib.fa': No such file or directory cp: cannot stat 'BCMY01.fasta.mod.LTRlib.fa': No such file or directory Error: LTR results not found!
cat: BCMY01.fasta.mod.TIR.intact.fa: No such file or directory cat: BCMY01.fasta.mod.Helitron.intact.fa: No such file or directory cat: BCMY01.fasta.mod.TIR.intact.bed: No such file or directory cat: BCMY01.fasta.mod.Helitron.intact.bed: No such file or directory Sat 03 Jul 2021 10:29:35 PM +04 Obtain raw TE libraries finished. All intact TEs found by EDTA: BCMY01.fasta.mod.EDTA.intact.fa BCMY01.fasta.mod.EDTA.intact.gff3
Sat 03 Jul 2021 10:29:35 PM +04 Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:
Sat 03 Jul 2021 10:38:31 PM +04 EDTA advance filtering finished.
Sat 03 Jul 2021 10:38:31 PM +04 Perform EDTA final steps to generate a non-redundant comprehensive TE library:
2021-07-03 23:17:46,244 -WARNING- Grid computing is not available because DRMAA not configured properly: Could not find drmaa library. Please specify its full path using the environment variable DRMAA_LIBRARY_PATH 2021-07-03 23:17:46,250 -INFO- VARS: {'sequence': 'BCMY01.fasta.mod.RM.consensi.fa', 'hmm_database': 'rexdb', 'seq_type': 'nucl', 'prefix': 'BCMY01.fasta.mod.RM.consensi.fa.rexdb', 'force_write_hmmscan': False, 'processors': 12, 'tmp_dir': './tmp', 'min_coverage': 20, 'max_evalue': 0.001, 'disable_pass2': False, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'no_cleanup': False, 'p2_identity': 80.0, 'p2_coverage': 80.0, 'p2_length': 80.0} 2021-07-03 23:17:46,250 -INFO- checking dependencies: 2021-07-03 23:17:46,260 -INFO- hmmer 3.3.1 OK 2021-07-03 23:17:46,319 -INFO- blastn 2.10.0+ OK 2021-07-03 23:17:46,319 -INFO- check database rexdb 2021-07-03 23:17:46,319 -INFO- db path: /home/sajjad/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database 2021-07-03 23:17:46,319 -INFO- db file: REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 2021-07-03 23:17:46,320 -INFO- REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm OK 2021-07-03 23:17:46,320 -INFO- Start classifying pipeline 2021-07-03 23:17:46,331 -INFO- total 32 sequences 2021-07-03 23:17:46,331 -INFO- translating
BCMY01.fasta.mod.RM.consensi.fa
in six frames /home/sajjad/.local/lib/python3.6/site-packages/Bio/Seq.py:2859: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. BiopythonWarning, 2021-07-03 23:17:46,355 -INFO- HMM scanning against/home/sajjad/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm
2021-07-03 23:17:46,358 -INFO- Creating server instance (pp-1.6.4.4) 2021-07-03 23:17:46,358 -INFO- Running on Python 3.6.12 linux 2021-07-03 23:17:47,270 -INFO- pp local server started with 12 workers 2021-07-03 23:17:47,304 -INFO- Task 0 started 2021-07-03 23:17:47,305 -INFO- Task 1 started 2021-07-03 23:17:47,305 -INFO- Task 2 started 2021-07-03 23:17:47,306 -INFO- Task 3 started 2021-07-03 23:17:47,306 -INFO- Task 4 started 2021-07-03 23:17:47,307 -INFO- Task 5 started 2021-07-03 23:17:47,308 -INFO- Task 6 started 2021-07-03 23:17:47,319 -INFO- Task 7 started 2021-07-03 23:17:47,320 -INFO- Task 8 started 2021-07-03 23:17:47,320 -INFO- Task 9 started 2021-07-03 23:17:47,321 -INFO- Task 10 started 2021-07-03 23:17:47,322 -INFO- Task 11 started 2021-07-03 23:17:47,550 -INFO- generating gene anntations 2021-07-03 23:17:47,553 -INFO- 0 sequences classified by HMM 2021-07-03 23:17:47,553 -INFO- see protein domain sequences inBCMY01.fasta.mod.RM.consensi.fa.rexdb.dom.faa
and annotation gff3 file inBCMY01.fasta.mod.RM.consensi.fa.rexdb.dom.gff3
2021-07-03 23:17:47,553 -WARNING- skipping pass-2 classification for zero classification in step-1 2021-07-03 23:17:47,554 -INFO- see classified sequences inBCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.tsv
2021-07-03 23:17:47,554 -INFO- writing library for RepeatMasker inBCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.lib
2021-07-03 23:17:47,556 -INFO- writing classified protein domains inBCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.pep
2021-07-03 23:17:47,556 -INFO- Summary of classifications: Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains 2021-07-03 23:17:47,556 -INFO- Pipeline done. 2021-07-03 23:17:47,556 -INFO- cleaning the temporary directory ./tmpERROR: Intact TE annotation not found in BCMY01.fasta.mod.EDTA.intact.gff3 at EDTA.pl line 566.