oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
342 stars 73 forks source link

Intact TE annotation not found in BCMY01 #203

Closed sajjadasaf closed 3 years ago

sajjadasaf commented 3 years ago

(EDTA) sajjad@sajjad-ThinkStation-P910:~/Downloads/EDTA$ perl EDTA.pl --genome BCMY01.fasta --species others --step all --sensitive 1 --evaluate 1 --anno 1 --force 1 -t 12

########################################################

Extensive de-novo TE Annotator (EDTA) v1.9.6
Shujun Ou (shujun.ou.1@gmail.com)

########################################################

Sat 03 Jul 2021 10:29:17 PM +04 Dependency checking: All passed!

Sat 03 Jul 2021 10:29:19 PM +04 The longest sequence ID in the genome contains 80 characters, which is longer than the limit (15) Trying to reformat seq IDs... Attempt 1... Attempt 2... Sat 03 Jul 2021 10:29:19 PM +04 Seq ID conversion successful!

Sat 03 Jul 2021 10:29:20 PM +04 Obtain raw TE libraries using various structure-based programs: Sat 03 Jul 2021 10:29:20 PM +04 EDTA_raw: Check dependencies, prepare working directories.

Sat 03 Jul 2021 10:29:21 PM +04 Start to find LTR candidates.

Sat 03 Jul 2021 10:29:21 PM +04 Identify LTR retrotransposon candidates from scratch.

awk: fatal: cannot open file `BCMY01.fasta.mod.pass.list' for reading (No such file or directory) Warning: LOC list - is empty.

Error: Error while loading sequence perl filter_gff3.pl file.gff3 file.list > new.gff3

cp: cannot stat 'BCMY01.fasta.mod.LTRlib.fa': No such file or directory cp: cannot stat 'BCMY01.fasta.mod.LTRlib.fa': No such file or directory Error: LTR results not found!

cat: BCMY01.fasta.mod.TIR.intact.fa: No such file or directory cat: BCMY01.fasta.mod.Helitron.intact.fa: No such file or directory cat: BCMY01.fasta.mod.TIR.intact.bed: No such file or directory cat: BCMY01.fasta.mod.Helitron.intact.bed: No such file or directory Sat 03 Jul 2021 10:29:35 PM +04 Obtain raw TE libraries finished. All intact TEs found by EDTA: BCMY01.fasta.mod.EDTA.intact.fa BCMY01.fasta.mod.EDTA.intact.gff3

Sat 03 Jul 2021 10:29:35 PM +04 Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:

Sat 03 Jul 2021 10:38:31 PM +04 EDTA advance filtering finished.

Sat 03 Jul 2021 10:38:31 PM +04 Perform EDTA final steps to generate a non-redundant comprehensive TE library:

            Use RepeatModeler to identify any remaining TEs that are missed by structure-based methods.

2021-07-03 23:17:46,244 -WARNING- Grid computing is not available because DRMAA not configured properly: Could not find drmaa library. Please specify its full path using the environment variable DRMAA_LIBRARY_PATH 2021-07-03 23:17:46,250 -INFO- VARS: {'sequence': 'BCMY01.fasta.mod.RM.consensi.fa', 'hmm_database': 'rexdb', 'seq_type': 'nucl', 'prefix': 'BCMY01.fasta.mod.RM.consensi.fa.rexdb', 'force_write_hmmscan': False, 'processors': 12, 'tmp_dir': './tmp', 'min_coverage': 20, 'max_evalue': 0.001, 'disable_pass2': False, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'no_cleanup': False, 'p2_identity': 80.0, 'p2_coverage': 80.0, 'p2_length': 80.0} 2021-07-03 23:17:46,250 -INFO- checking dependencies: 2021-07-03 23:17:46,260 -INFO- hmmer 3.3.1 OK 2021-07-03 23:17:46,319 -INFO- blastn 2.10.0+ OK 2021-07-03 23:17:46,319 -INFO- check database rexdb 2021-07-03 23:17:46,319 -INFO- db path: /home/sajjad/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database 2021-07-03 23:17:46,319 -INFO- db file: REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 2021-07-03 23:17:46,320 -INFO- REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm OK 2021-07-03 23:17:46,320 -INFO- Start classifying pipeline 2021-07-03 23:17:46,331 -INFO- total 32 sequences 2021-07-03 23:17:46,331 -INFO- translating BCMY01.fasta.mod.RM.consensi.fa in six frames /home/sajjad/.local/lib/python3.6/site-packages/Bio/Seq.py:2859: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. BiopythonWarning, 2021-07-03 23:17:46,355 -INFO- HMM scanning against /home/sajjad/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 2021-07-03 23:17:46,358 -INFO- Creating server instance (pp-1.6.4.4) 2021-07-03 23:17:46,358 -INFO- Running on Python 3.6.12 linux 2021-07-03 23:17:47,270 -INFO- pp local server started with 12 workers 2021-07-03 23:17:47,304 -INFO- Task 0 started 2021-07-03 23:17:47,305 -INFO- Task 1 started 2021-07-03 23:17:47,305 -INFO- Task 2 started 2021-07-03 23:17:47,306 -INFO- Task 3 started 2021-07-03 23:17:47,306 -INFO- Task 4 started 2021-07-03 23:17:47,307 -INFO- Task 5 started 2021-07-03 23:17:47,308 -INFO- Task 6 started 2021-07-03 23:17:47,319 -INFO- Task 7 started 2021-07-03 23:17:47,320 -INFO- Task 8 started 2021-07-03 23:17:47,320 -INFO- Task 9 started 2021-07-03 23:17:47,321 -INFO- Task 10 started 2021-07-03 23:17:47,322 -INFO- Task 11 started 2021-07-03 23:17:47,550 -INFO- generating gene anntations 2021-07-03 23:17:47,553 -INFO- 0 sequences classified by HMM 2021-07-03 23:17:47,553 -INFO- see protein domain sequences in BCMY01.fasta.mod.RM.consensi.fa.rexdb.dom.faa and annotation gff3 file in BCMY01.fasta.mod.RM.consensi.fa.rexdb.dom.gff3 2021-07-03 23:17:47,553 -WARNING- skipping pass-2 classification for zero classification in step-1 2021-07-03 23:17:47,554 -INFO- see classified sequences in BCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.tsv 2021-07-03 23:17:47,554 -INFO- writing library for RepeatMasker in BCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.lib 2021-07-03 23:17:47,556 -INFO- writing classified protein domains in BCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.pep 2021-07-03 23:17:47,556 -INFO- Summary of classifications: Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains 2021-07-03 23:17:47,556 -INFO- Pipeline done. 2021-07-03 23:17:47,556 -INFO- cleaning the temporary directory ./tmp

Input file "BCMY01.fasta.mod.RepeatModeler.raw.fa.masked" not found!

    Usage: perl cleanup_tandem.pl -f sample.fa [options] > sample.cln.fa 
Options:
    -misschar   [n|l]   Define the letter representing unknown sequences; default: n. l: recognize lower case letters
    -Nscreen    [0|1]   Enable (1) or disable (0) the -nc parameter; default: 1
    -nc     [int]   Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0
    -nr     [0-1]   Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1
    -minlen     [int]   Minimum sequence length filter after clean up; default: 100 (bp)
    -maxlen     [int]   Maximum sequence length filter after clean up; default: 25000 (bp)
    -cleanN     [0|1]   Retain (0) or remove (1) the -misschar taget in output sequence; default: 0
    -cleanT     [0|1]   Remove entire seq. if any terminal seq (20bp) has 15bp of N (1); disabled by default (0).
    -minrm      [int]   The minimum length of -misschar to be removed if -cleanN 1; default: 1.
    -trf        [0|1]   Enable (1) or disable (0) tandem repeat finder (trf); default: 1
    -trf_path   path    Path to the trf program

            Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

ERROR: Intact TE annotation not found in BCMY01.fasta.mod.EDTA.intact.gff3 at EDTA.pl line 566.

oushujun commented 3 years ago

You'd better fix your sequence IDs before further testing since you see:

Trying to reformat seq IDs... Attempt 1... Attempt 2...

The program will try to fix this but if you encounter errors, you should fix it yourself even if it said "successful." Otherwise I can't determine what could be the cause.

Shujun

sajjadasaf commented 3 years ago

I have change sequence ID to just seq1, seq2 but still the following error shown. I have checked in other genome and working properly.

(EDTA) sajjad@sajjad-ThinkStation-P910:~/Downloads/EDTA$ perl EDTA.pl --genome BCMY01.fasta --species others --step all --sensitive 1 --evaluate 1 --anno 1 --force 1 -t 12

########################################################

Extensive de-novo TE Annotator (EDTA) v1.9.6
Shujun Ou (shujun.ou.1@gmail.com)

########################################################

Sun 04 Jul 2021 09:45:03 PM +04 Dependency checking: All passed!

Sun 04 Jul 2021 09:45:05 PM +04 Obtain raw TE libraries using various structure-based programs: Sun 04 Jul 2021 09:45:05 PM +04 EDTA_raw: Check dependencies, prepare working directories.

Sun 04 Jul 2021 09:45:07 PM +04 Start to find LTR candidates.

Sun 04 Jul 2021 09:45:07 PM +04 Identify LTR retrotransposon candidates from scratch.

awk: fatal: cannot open file `BCMY01.fasta.mod.pass.list' for reading (No such file or directory) Warning: LOC list - is empty.

Error: Error while loading sequence perl filter_gff3.pl file.gff3 file.list > new.gff3

cp: cannot stat 'BCMY01.fasta.mod.LTRlib.fa': No such file or directory cp: cannot stat 'BCMY01.fasta.mod.LTRlib.fa': No such file or directory Error: LTR results not found!

cat: BCMY01.fasta.mod.TIR.intact.fa: No such file or directory cat: BCMY01.fasta.mod.Helitron.intact.fa: No such file or directory cat: BCMY01.fasta.mod.TIR.intact.bed: No such file or directory cat: BCMY01.fasta.mod.Helitron.intact.bed: No such file or directory Sun 04 Jul 2021 09:45:20 PM +04 Obtain raw TE libraries finished. All intact TEs found by EDTA: BCMY01.fasta.mod.EDTA.intact.fa BCMY01.fasta.mod.EDTA.intact.gff3

Sun 04 Jul 2021 09:45:20 PM +04 Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:

Sun 04 Jul 2021 09:54:17 PM +04 EDTA advance filtering finished.

Sun 04 Jul 2021 09:54:17 PM +04 Perform EDTA final steps to generate a non-redundant comprehensive TE library:

            Use RepeatModeler to identify any remaining TEs that are missed by structure-based methods.

2021-07-04 22:24:09,049 -WARNING- Grid computing is not available because DRMAA not configured properly: Could not find drmaa library. Please specify its full path using the environment variable DRMAA_LIBRARY_PATH 2021-07-04 22:24:09,055 -INFO- VARS: {'sequence': 'BCMY01.fasta.mod.RM.consensi.fa', 'hmm_database': 'rexdb', 'seq_type': 'nucl', 'prefix': 'BCMY01.fasta.mod.RM.consensi.fa.rexdb', 'force_write_hmmscan': False, 'processors': 12, 'tmp_dir': './tmp', 'min_coverage': 20, 'max_evalue': 0.001, 'disable_pass2': False, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'no_cleanup': False, 'p2_identity': 80.0, 'p2_coverage': 80.0, 'p2_length': 80.0} 2021-07-04 22:24:09,055 -INFO- checking dependencies: 2021-07-04 22:24:09,065 -INFO- hmmer 3.3.1 OK 2021-07-04 22:24:09,126 -INFO- blastn 2.10.0+ OK 2021-07-04 22:24:09,126 -INFO- check database rexdb 2021-07-04 22:24:09,126 -INFO- db path: /home/sajjad/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database 2021-07-04 22:24:09,126 -INFO- db file: REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 2021-07-04 22:24:09,126 -INFO- REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm OK 2021-07-04 22:24:09,126 -INFO- Start classifying pipeline 2021-07-04 22:24:09,136 -INFO- total 33 sequences 2021-07-04 22:24:09,136 -INFO- translating BCMY01.fasta.mod.RM.consensi.fa in six frames /home/sajjad/.local/lib/python3.6/site-packages/Bio/Seq.py:2859: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. BiopythonWarning, 2021-07-04 22:24:09,179 -INFO- HMM scanning against /home/sajjad/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 2021-07-04 22:24:09,183 -INFO- Creating server instance (pp-1.6.4.4) 2021-07-04 22:24:09,183 -INFO- Running on Python 3.6.12 linux 2021-07-04 22:24:10,116 -INFO- pp local server started with 12 workers 2021-07-04 22:24:10,149 -INFO- Task 0 started 2021-07-04 22:24:10,150 -INFO- Task 1 started 2021-07-04 22:24:10,150 -INFO- Task 2 started 2021-07-04 22:24:10,151 -INFO- Task 3 started 2021-07-04 22:24:10,152 -INFO- Task 4 started 2021-07-04 22:24:10,154 -INFO- Task 5 started 2021-07-04 22:24:10,155 -INFO- Task 6 started 2021-07-04 22:24:10,156 -INFO- Task 7 started 2021-07-04 22:24:10,156 -INFO- Task 8 started 2021-07-04 22:24:10,157 -INFO- Task 9 started 2021-07-04 22:24:10,157 -INFO- Task 10 started 2021-07-04 22:24:10,158 -INFO- Task 11 started 2021-07-04 22:24:10,397 -INFO- generating gene anntations 2021-07-04 22:24:10,401 -INFO- 0 sequences classified by HMM 2021-07-04 22:24:10,401 -INFO- see protein domain sequences in BCMY01.fasta.mod.RM.consensi.fa.rexdb.dom.faa and annotation gff3 file in BCMY01.fasta.mod.RM.consensi.fa.rexdb.dom.gff3 2021-07-04 22:24:10,401 -WARNING- skipping pass-2 classification for zero classification in step-1 2021-07-04 22:24:10,401 -INFO- see classified sequences in BCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.tsv 2021-07-04 22:24:10,402 -INFO- writing library for RepeatMasker in BCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.lib 2021-07-04 22:24:10,404 -INFO- writing classified protein domains in BCMY01.fasta.mod.RM.consensi.fa.rexdb.cls.pep 2021-07-04 22:24:10,404 -INFO- Summary of classifications: Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains 2021-07-04 22:24:10,404 -INFO- Pipeline done. 2021-07-04 22:24:10,404 -INFO- cleaning the temporary directory ./tmp

Input file "BCMY01.fasta.mod.RepeatModeler.raw.fa.masked" not found!

    Usage: perl cleanup_tandem.pl -f sample.fa [options] > sample.cln.fa 
Options:
    -misschar   [n|l]   Define the letter representing unknown sequences; default: n. l: recognize lower case letters
    -Nscreen    [0|1]   Enable (1) or disable (0) the -nc parameter; default: 1
    -nc     [int]   Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0
    -nr     [0-1]   Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1
    -minlen     [int]   Minimum sequence length filter after clean up; default: 100 (bp)
    -maxlen     [int]   Maximum sequence length filter after clean up; default: 25000 (bp)
    -cleanN     [0|1]   Retain (0) or remove (1) the -misschar taget in output sequence; default: 0
    -cleanT     [0|1]   Remove entire seq. if any terminal seq (20bp) has 15bp of N (1); disabled by default (0).
    -minrm      [int]   The minimum length of -misschar to be removed if -cleanN 1; default: 1.
    -trf        [0|1]   Enable (1) or disable (0) tandem repeat finder (trf); default: 1
    -trf_path   path    Path to the trf program

            Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

ERROR: Intact TE annotation not found in BCMY01.fasta.mod.EDTA.intact.gff3 at EDTA.pl line 566.

oushujun commented 3 years ago

Thanks for checking. Your case is similar to #188, please update to v1.9.9 and try again. - Shujun

sajjadasaf commented 3 years ago

Thank you for your help problem solved.