Closed bioteksampath closed 3 years ago
Hi Sam,
Your log looks fine with me. You can ignore the dmraa warning because it does no harm to the annotation. Please check the genome.fa.mod.EDTA.TEanno.sum
file for summary of TE annotations. If you encounter TIR-related errors, please post them here.
Best, Shujun
Hi Shujun, My EDTA run with both with test data and mydata does't yield any DNA TE but LTR and helitorn works okay. wondering what might be the issues,
My log report says - GAMAA library issue? do you have any solution? Thanks.
log repot from test data:
Tue Jan 26 19:24:08 CST 2021 Dependency checking: All passed!
Tue Jan 26 19:24:12 CST 2021 Obtain raw TE libraries using various structure-based programs: Tue Jan 26 19:24:12 CST 2021 EDTA_raw: Check dependencies, prepare working directories.
Tue Jan 26 19:24:16 CST 2021 Start to find LTR candidates.
Tue Jan 26 19:24:16 CST 2021 Identify LTR retrotransposon candidates from scratch.
Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty. Tue Jan 26 19:24:52 CST 2021 Finish finding LTR candidates.
Tue Jan 26 19:24:52 CST 2021 Start to find TIR candidates.
Tue Jan 26 19:24:53 CST 2021 Identify TIR candidates from scratch.
Species: others Tue Jan 26 19:25:57 CST 2021 Finish finding TIR candidates.
Tue Jan 26 19:25:57 CST 2021 Start to find Helitron candidates.
Tue Jan 26 19:25:57 CST 2021 Identify Helitron candidates from scratch.
Tue Jan 26 19:26:34 CST 2021 Finish finding Helitron candidates.
Tue Jan 26 19:26:34 CST 2021 Execution of EDTA_raw.pl is finished!
Tue Jan 26 19:26:35 CST 2021 Obtain raw TE libraries finished. All intact TEs found by EDTA: genome.fa.mod.EDTA.intact.fa genome.fa.mod.EDTA.intact.gff3
Tue Jan 26 19:26:35 CST 2021 Perform EDTA advcance filtering for raw TE candidates and generate the stage 1 library:
Tue Jan 26 19:27:33 CST 2021 EDTA advcance filtering finished.
Tue Jan 26 19:27:33 CST 2021 Perform EDTA final steps to generate a non-redundant comprehensive TE library:
2021-01-26 19:28:43,667 -WARNING- Grid computing is not available because DRMAA not configured properly: Could not find drmaa library. Please specify its full path using the environment variable DRMAA_LIBRARY_PATH 2021-01-26 19:28:43,699 -INFO- VARS: {'sequence': 'genome.fa.mod.RM.consensi.fa', 'hmm_database': 'rexdb', 'seq_type': 'nucl', 'prefix': 'genome.fa.mod.RM.consensi.fa.rexdb', 'force_write_hmmscan': False, 'processors': 10, 'tmp_dir': './tmp', 'min_coverage': 20, 'max_evalue': 0.001, 'disable_pass2': False, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'no_cleanup': False, 'p2_identity': 80.0, 'p2_coverage': 80.0, 'p2_length': 80.0} 2021-01-26 19:28:43,699 -INFO- checking dependencies: 2021-01-26 19:28:43,744 -INFO- hmmer 3.3.1 OK 2021-01-26 19:28:44,058 -INFO- blastn 2.10.0+ OK 2021-01-26 19:28:44,061 -INFO- check database rexdb 2021-01-26 19:28:44,061 -INFO- db path: /home/sap223/anaconda3/envs/ET/lib/python3.6/site-packages/TEsorter/database 2021-01-26 19:28:44,061 -INFO- db file: REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 2021-01-26 19:28:44,064 -INFO- REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm OK 2021-01-26 19:28:44,064 -INFO- Start classifying pipeline 2021-01-26 19:28:44,139 -INFO- total 1 sequences 2021-01-26 19:28:44,140 -INFO- translating
genome.fa.mod.RM.consensi.fa
in six frames /home/sap223/anaconda3/envs/ET/lib/python3.6/site-packages/Bio/Seq.py:2338: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. BiopythonWarning, 2021-01-26 19:28:44,170 -INFO- HMM scanning against/home/sap223/anaconda3/envs/ET/lib/python3.6/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm
2021-01-26 19:28:44,215 -INFO- Creating server instance (pp-1.6.4.4) 2021-01-26 19:28:44,215 -INFO- Running on Python 3.6.12 linux 2021-01-26 19:28:48,017 -INFO- pp local server started with 10 workers 2021-01-26 19:28:48,073 -INFO- Task 0 started 2021-01-26 19:28:48,074 -INFO- Task 1 started 2021-01-26 19:28:48,075 -INFO- Task 2 started 2021-01-26 19:28:48,076 -INFO- Task 3 started 2021-01-26 19:28:48,077 -INFO- Task 4 started 2021-01-26 19:28:48,077 -INFO- Task 5 started 2021-01-26 19:28:48,077 -INFO- Task 4 started 2021-01-26 19:28:48,077 -INFO- Task 5 started 2021-01-26 19:28:48,680 -INFO- generating gene anntations 2021-01-26 19:28:48,697 -INFO- 0 sequences classified by HMM 2021-01-26 19:28:48,697 -INFO- see protein domain sequences ingenome.fa.mod.RM.consensi.fa.rexdb.dom.faa
and annotation gff3 file ingenome.fa.mod.RM.consensi.fa.rexdb.dom.gff3
2021-01-26 19:28:48,697 -WARNING- skipping pass-2 classification for zero classification in step-1 2021-01-26 19:28:48,697 -INFO- see classified sequences ingenome.fa.mod.RM.consensi.fa.rexdb.cls.tsv
2021-01-26 19:28:48,698 -INFO- writing library for RepeatMasker ingenome.fa.mod.RM.consensi.fa.rexdb.cls.lib
2021-01-26 19:28:48,703 -INFO- writing classified protein domains ingenome.fa.mod.RM.consensi.fa.rexdb.cls.pep
2021-01-26 19:28:48,707 -INFO- Summary of classifications: Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains 2021-01-26 19:28:48,708 -INFO- Pipeline done. 2021-01-26 19:28:48,708 -INFO- cleaning the temporary directory ./tmpTue Jan 26 19:29:08 CST 2021 Clean up TE-related sequences in the CDS file with TEsorter:
2021-01-26 19:29:09,952 -WARNING- Grid computing is not available because DRMAA not configured properly: Could not find drmaa library. Please specify its full path using the environment variable DRMAA_LIBRARY_PATH 2021-01-26 19:29:09,978 -INFO- VARS: {'sequence': 'genome.cds.fa.code', 'hmm_database': 'rexdb', 'seq_type': 'nucl', 'prefix': 'genome.cds.fa.code.rexdb', 'force_write_hmmscan': False, 'processors': 10, 'tmp_dir': './tmp', 'min_coverage': 20, 'max_evalue': 0.001, 'disable_pass2': False, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'no_cleanup': False, 'p2_identity': 80.0, 'p2_coverage': 80.0, 'p2_length': 80.0} 2021-01-26 19:29:09,978 -INFO- checking dependencies: 2021-01-26 19:29:10,004 -INFO- hmmer 3.3.1 OK 2021-01-26 19:29:10,320 -INFO- blastn 2.10.0+ OK 2021-01-26 19:29:10,323 -INFO- check database rexdb 2021-01-26 19:29:10,323 -INFO- db path: /home/sap223/anaconda3/envs/ET/lib/python3.6/site-packages/TEsorter/database 2021-01-26 19:29:10,323 -INFO- db file: REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 2021-01-26 19:29:10,324 -INFO- REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm OK 2021-01-26 19:29:10,324 -INFO- Start classifying pipeline 2021-01-26 19:29:10,397 -INFO- total 139 sequences 2021-01-26 19:29:10,397 -INFO- translating
genome.cds.fa.code
in six frames /home/sap223/anaconda3/envs/ET/lib/python3.6/site-packages/Bio/Seq.py:2338: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. BiopythonWarning, 2021-01-26 19:29:10,675 -INFO- HMM scanning against/home/sap223/anaconda3/envs/ET/lib/python3.6/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm
2021-01-26 19:29:10,737 -INFO- Creating server instance (pp-1.6.4.4) 2021-01-26 19:29:10,737 -INFO- Running on Python 3.6.12 linux 2021-01-26 19:29:14,444 -INFO- pp local server started with 10 workers 2021-01-26 19:29:14,491 -INFO- Task 0 started 2021-01-26 19:29:14,493 -INFO- Task 1 started 2021-01-26 19:29:14,493 -INFO- Task 2 started 2021-01-26 19:29:14,493 -INFO- Task 3 started 2021-01-26 19:29:14,494 -INFO- Task 4 started 2021-01-26 19:29:14,495 -INFO- Task 5 started 2021-01-26 19:29:14,495 -INFO- Task 6 started 2021-01-26 19:29:14,497 -INFO- Task 7 started 2021-01-26 19:29:14,498 -INFO- Task 8 started 2021-01-26 19:29:14,499 -INFO- Task 9 started 2021-01-26 19:29:18,022 -INFO- generating gene anntations 2021-01-26 19:29:18,056 -INFO- 2 sequences classified by HMM 2021-01-26 19:29:18,056 -INFO- see protein domain sequences ingenome.cds.fa.code.rexdb.dom.faa
and annotation gff3 file ingenome.cds.fa.code.rexdb.dom.gff3
2021-01-26 19:29:18,056 -INFO- classifying the unclassified sequences by searching against the classified ones 2021-01-26 19:29:18,079 -INFO- using the 80-80-80 rule 2021-01-26 19:29:18,079 -INFO- run CMD:makeblastdb -in ./tmp/pass1_classified.fa -dbtype nucl
2021-01-26 19:29:18,373 -INFO- run CMD:blastn -query ./tmp/pass1_unclassified.fa -db ./tmp/pass1_classified.fa -out ./tmp/pass1_unclassified.fa.blastout -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs qcovhsp sstrand' -num_threads 10
2021-01-26 19:29:18,771 -INFO- 1 sequences classified in pass 2 2021-01-26 19:29:18,772 -INFO- total 3 sequences classified. 2021-01-26 19:29:18,772 -INFO- see classified sequences ingenome.cds.fa.code.rexdb.cls.tsv
2021-01-26 19:29:18,772 -INFO- writing library for RepeatMasker ingenome.cds.fa.code.rexdb.cls.lib
2021-01-26 19:29:18,786 -INFO- writing classified protein domains ingenome.cds.fa.code.rexdb.cls.pep
2021-01-26 19:29:18,792 -INFO- Summary of classifications: Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains LTR Gypsy 1 1 1 0 Maverick unknown 2 0 0 0 2021-01-26 19:29:18,792 -INFO- Pipeline done. 2021-01-26 19:29:18,792 -INFO- cleaning the temporary directory ./tmp Remove CDS-related sequences in the EDTA library:Tue Jan 26 19:29:40 CST 2021 Combine the high-quality TE library rice6.9.5.liban with the EDTA library:
Tue Jan 26 19:29:54 CST 2021 EDTA final stage finished! You may check out: The final EDTA TE library: genome.fa.mod.EDTA.TElib.fa Family names of intact TEs have been updated by rice6.9.5.liban: genome.fa.mod.EDTA.intact.gff3 Comparing to the provided library, EDTA found these novel TEs: genome.fa.mod.EDTA.TElib.novel.fa The provided library has been incorporated into the final library: genome.fa.mod.EDTA.TElib.fa
Tue Jan 26 19:29:54 CST 2021 Perform post-EDTA analysis for whole-genome annotation:
Tue Jan 26 19:29:54 CST 2021 Homology-based annotation of TEs using genome.fa.mod.EDTA.TElib.fa from scratch.
Tue Jan 26 19:30:06 CST 2021 TE annotation using the EDTA library has finished! Check out: Whole-genome TE annotation (total TE: 35.78%): genome.fa.mod.EDTA.TEanno.gff3 Whole-genome TE annotation summary: genome.fa.mod.EDTA.TEanno.sum Low-threshold TE masking for MAKER gene annotation (masked: 16.32%): genome.fa.mod.MAKER.masked
Tue Jan 26 19:30:06 CST 2021 Evaluate the level of inconsistency for whole-genome TE annotation (slow step):
Tue Jan 26 19:32:02 CST 2021 Evaluation of TE annotation finished! Check out these files: