oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

ERROR:EDTA.pl line 583 #417

Open Astesimal opened 5 months ago

Astesimal commented 5 months ago

Hello author, It's a great software, and I've successfully worked with dozens of species. But when I used EDTA to work with some genomes, there is a problem: LTR, TIR, Helitron result file has 0 bp. I'm sorry to bother you. Is there any way to solve this problem? Looking forward to your reply!

`'/home/dell/EDTA-master/EDTA.pl' --genome '/home/dell/users/GCA_genomic.fna' --sensitive 1 --anno 1 --threads 10 --force 1 > ./edta.log 2024年 01月 11日 星期四 17:36:24 CST EDTA_raw: Check dependencies, prepare working directories.

2024年 01月 11日 星期四 17:36:25 CST Start to find LTR candidates.

2024年 01月 11日 星期四 17:36:25 CST Identify LTR retrotransposon candidates from scratch.

awk: cannot open GCA_genomic.fna.mod.pass.list (No such file or directory) Warning: LOC list - is empty.

Error: Error while loading sequence perl filter_gff3.pl file.gff3 file.list > new.gff3

2024年 01月 11日 星期四 17:59:26 CST Warning: The LTR result file has 0 bp!

2024年 01月 11日 星期四 17:59:26 CST Start to find TIR candidates.

2024年 01月 11日 星期四 17:59:26 CST Identify TIR candidates from scratch.

Species: others cp: 无法获取'TIR-Learner/-p' 的文件状态(stat): 没有那个文件或目录 cat: '-+-DTA.fa': 没有那个文件或目录 cat: '-+-DTC.fa': 没有那个文件或目录 cat: '-+-DTH.fa': 没有那个文件或目录 cat: '-+-DTM.fa': 没有那个文件或目录 cat: '-+-DTT.fa': 没有那个文件或目录 cat: '-+-NonTIR.fa': 没有那个文件或目录 cat: '-+--+-.gff3': 没有那个文件或目录 rm: 无法删除'-+--+-*.gff3': 没有那个文件或目录 Traceback (most recent call last): File "/home/dell/users/EDTA-master/bin/TIR-Learner2.5/Module3_New/CombineAll.py", line 75, in f_m3=removeDupinSingle("%s.gff3"%(genome_Name+spliter+"Module3")) File "/home/dell/users/EDTA-master/bin/TIR-Learner2.5/Module3_New/CombineAll.py", line 57, in removeDupinSingle f=pd.read_csv(file,header=None,sep="\t") #shujun File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, *kwds) File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(args, *kwds)) File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/dell/users/EDTA-master/bin/TIR-Learner2.5/Module3/GetAllSeq.py", line 32, in GetListFromFile f=open(file,"r+") FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn_filter.gff3' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/dell/users/EDTA-master/bin/TIR-Learner2.5/Module3/GetAllSeq.py", line 63, in pool.map(GetListFromFile,fileList) #shujun File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn_filter.gff3' mv: 无法获取'TIR-Learner/FinalAnn.gff3' 的文件状态(stat): 没有那个文件或目录 mv: 无法获取'TIR-Learner/FinalAnn.fa' 的文件状态(stat): 没有那个文件或目录 Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: 没有那个文件或目录 at /home/dell/users/EDTA-master/util/rename_tirlearner.pl line 19. Warning: LOC list GCA_genomic.fna.mod.TIR.ext30.list is empty.

Error: Error while loading sequenceCan't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: 没有那个文件或目录. Warning: The TIR result file has 0 bp!

2024年 01月 11日 星期四 18:27:23 CST Start to find Helitron candidates.

2024年 01月 11日 星期四 18:27:23 CST Identify Helitron candidates from scratch.

Error: Error while loading sequence perl make_bed_with_intact.pl EDTA.intact.fa > EDTA.intact.bed

2024年 01月 11日 星期四 18:51:36 CST Warning: The Helitron result file has 0 bp!

2024年 01月 11日 星期四 18:51:36 CST Execution of EDTA_raw.pl is finished!

2024-01-11 21:39:23,440 -WARNING- Grid computing is not available because DRMAA not configured properly: Could not find drmaa library. Please specify its full path using the environment variable DRMAA_LIBRARY_PATH 2024-01-11 21:39:23,443 -INFO- VARS: {'sequence': 'GCA_genomic.fna.mod.RM.consensi.fa', 'hmm_database': 'rexdb', 'seq_type': 'nucl', 'prefix': 'GCA_genomic.fna.mod.RM.consensi.fa.rexdb', 'force_write_hmmscan': False, 'processors': 10, 'tmp_dir': './tmp', 'min_coverage': 20, 'max_evalue': 0.001, 'disable_pass2': False, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'no_cleanup': False, 'p2_identity': 80.0, 'p2_coverage': 80.0, 'p2_length': 80.0} 2024-01-11 21:39:23,444 -INFO- checking dependencies: 2024-01-11 21:39:23,456 -INFO- hmmer 3.4 OK 2024-01-11 21:39:23,516 -INFO- blastn 2.10.0+ OK 2024-01-11 21:39:23,516 -INFO- check database rexdb 2024-01-11 21:39:23,516 -INFO- db path: /home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database 2024-01-11 21:39:23,516 -INFO- db file: REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 2024-01-11 21:39:23,517 -INFO- REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm OK 2024-01-11 21:39:23,517 -INFO- Start classifying pipeline 2024-01-11 21:39:23,535 -INFO- total 437 sequences 2024-01-11 21:39:23,535 -INFO- translating GCA_genomic.fna.mod.RM.consensi.fa in six frames /home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/Bio/Seq.py:2338: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. BiopythonWarning, 2024-01-11 21:39:23,752 -INFO- HMM scanning against /home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 2024-01-11 21:39:23,770 -INFO- Creating server instance (pp-1.6.4.4) 2024-01-11 21:39:23,770 -INFO- Running on Python 3.6.12 linux 2024-01-11 21:39:24,394 -INFO- pp local server started with 10 workers 2024-01-11 21:39:24,429 -INFO- Task 0 started 2024-01-11 21:39:24,430 -INFO- Task 1 started 2024-01-11 21:39:24,431 -INFO- Task 2 started 2024-01-11 21:39:24,431 -INFO- Task 3 started 2024-01-11 21:39:24,432 -INFO- Task 4 started 2024-01-11 21:39:24,432 -INFO- Task 5 started 2024-01-11 21:39:24,433 -INFO- Task 6 started 2024-01-11 21:39:24,433 -INFO- Task 7 started 2024-01-11 21:39:24,434 -INFO- Task 8 started 2024-01-11 21:39:24,435 -INFO- Task 9 started 2024-01-11 21:39:24,941 -INFO- generating gene anntations 2024-01-11 21:39:24,977 -INFO- 7 sequences classified by HMM 2024-01-11 21:39:24,977 -INFO- see protein domain sequences in GCA_021439955.1_ASM2143995v1_genomic.fna.mod.RM.consensi.fa.rexdb.dom.faa and annotation gff3 file in GCA_021439955.1_ASM2143995v1_genomic.fna.mod.RM.consensi.fa.rexdb.dom.gff3 2024-01-11 21:39:24,977 -INFO- classifying the unclassified sequences by searching against the classified ones 2024-01-11 21:39:24,986 -INFO- using the 80-80-80 rule 2024-01-11 21:39:24,986 -INFO- run CMD: makeblastdb -in ./tmp/pass1_classified.fa -dbtype nucl 2024-01-11 21:39:25,037 -INFO- run CMD: blastn -query ./tmp/pass1_unclassified.fa -db ./tmp/pass1_classified.fa -out ./tmp/pass1_unclassified.fa.blastout -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs qcovhsp sstrand' -num_threads 10 2024-01-11 21:39:25,162 -INFO- 1 sequences classified in pass 2 2024-01-11 21:39:25,163 -INFO- total 8 sequences classified. 2024-01-11 21:39:25,163 -INFO- see classified sequences in GCA_021439955.1_ASM2143995v1_genomic.fna.mod.RM.consensi.fa.rexdb.cls.tsv 2024-01-11 21:39:25,163 -INFO- writing library for RepeatMasker in GCA_021439955.1_ASM2143995v1_genomic.fna.mod.RM.consensi.fa.rexdb.cls.lib 2024-01-11 21:39:25,183 -INFO- writing classified protein domains in GCA_021439955.1_ASM2143995v1_genomic.fna.mod.RM.consensi.fa.rexdb.cls.pep 2024-01-11 21:39:25,184 -INFO- Summary of classifications: Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains LTR Gypsy 4 0 0 0 LINE unknown 1 0 0 0 TIR Tc1_Mariner 3 0 0 0 2024-01-11 21:39:25,184 -INFO- Pipeline done. 2024-01-11 21:39:25,184 -INFO- cleaning the temporary directory ./tmp ERROR: Intact TE annotation not found in GCA_genomic.fna.mod.EDTA.intact.gff3 at /home/dell/users/EDTA-master/EDTA.pl line 583. `

oushujun commented 5 months ago

Please update EDTA and try again.

Thanks! Shujun

On Sun, Jan 14, 2024 at 7:23 AM Astesimal @.***> wrote:

Hello author, It's a great software, and I've successfully worked with dozens of species. But when I used EDTA to work with some genomes, there is a problem: LTR, TIR, Helitron result file has 0 bp. I'm sorry to bother you. Is there any way to solve this problem? Looking forward to your reply!

`'/home/dell/EDTA-master/EDTA.pl' --genome '/home/dell/users/GCA_genomic.fna' --sensitive 1 --anno 1 --threads 10 --force 1 > ./edta.log 2024年 01月 11日 星期四 17:36:24 CST EDTA_raw: Check dependencies, prepare working directories.

2024年 01月 11日 星期四 17:36:25 CST Start to find LTR candidates.

2024年 01月 11日 星期四 17:36:25 CST Identify LTR retrotransposon candidates from scratch.

awk: cannot open GCA_genomic.fna.mod.pass.list (No such file or directory) Warning: LOC list - is empty.

Error: Error while loading sequence perl filter_gff3.pl file.gff3 file.list > new.gff3

2024年 01月 11日 星期四 17:59:26 CST Warning: The LTR result file has 0 bp!

2024年 01月 11日 星期四 17:59:26 CST Start to find TIR candidates.

2024年 01月 11日 星期四 17:59:26 CST Identify TIR candidates from scratch.

Species: others cp: 无法获取'TIR-Learner/ -p' 的文件状态(stat): 没有那个文件或目录 cat: '-+-DTA.fa': 没有那个文件或目录 cat: ' -+-DTC.fa': 没有那个文件或目录 cat: '-+-DTH.fa': 没有那个文件或目录 cat: ' -+-DTM.fa': 没有那个文件或目录 cat: '-+-DTT.fa': 没有那个文件或目录 cat: ' -+-NonTIR.fa': 没有那个文件或目录 cat: '-+--+-.gff3': 没有那个文件或目录 rm: 无法删除'-+--+-*.gff3': 没有那个文件或目录 Traceback (most recent call last): File "/home/dell/users/EDTA-master/bin/TIR-Learner2.5/Module3_New/CombineAll.py", line 75, in f_m3=removeDupinSingle("%s.gff3"%(genome_Name+spliter+"Module3")) File "/home/dell/users/EDTA-master/bin/TIR-Learner2.5/Module3_New/CombineAll.py", line 57, in removeDupinSingle f=pd.read_csv(file,header=None,sep="\t") #shujun File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, *kwds) File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(args, *kwds)) File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/dell/users/EDTA-master/bin/TIR-Learner2.5/Module3/GetAllSeq.py", line 32, in GetListFromFile f=open(file,"r+") FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn_filter.gff3' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/dell/users/EDTA-master/bin/TIR-Learner2.5/Module3/GetAllSeq.py", line 63, in pool.map(GetListFromFile,fileList) #shujun File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/dell/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn_filter.gff3' mv: 无法获取'TIR-Learner/FinalAnn.gff3' 的文件状态(stat): 没有那个文件或目录 mv: 无法获取'TIR-Learner/FinalAnn.fa' 的文件状态(stat): 没有那个文件或目录 Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: 没有那个文件或目录 at /home/dell/users/EDTA-master/util/rename_tirlearner.pl line 19. Warning: LOC list GCA_genomic.fna.mod.TIR.ext30.list is empty.

Error: Error while loading sequenceCan't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: 没有那个文件或目录. Warning: The TIR result file has 0 bp!

2024年 01月 11日 星期四 18:27:23 CST Start to find Helitron candidates.

2024年 01月 11日 星期四 18:27:23 CST Identify Helitron candidates from scratch.

Error: Error while loading sequence perl make_bed_with_intact.pl EDTA.intact.fa > EDTA.intact.bed

2024年 01月 11日 星期四 18:51:36 CST Warning: The Helitron result file has 0 bp!

2024年 01月 11日 星期四 18:51:36 CST Execution of EDTA_raw.pl is finished!

2024-01-11 21:39:23,440 -WARNING- Grid computing is not available because DRMAA not configured properly: Could not find drmaa library. Please specify its full path using the environment variable DRMAA_LIBRARY_PATH 2024-01-11 21:39:23,443 -INFO- VARS: {'sequence': 'GCA_genomic.fna.mod.RM.consensi.fa', 'hmm_database': 'rexdb', 'seq_type': 'nucl', 'prefix': 'GCA_genomic.fna.mod.RM.consensi.fa.rexdb', 'force_write_hmmscan': False, 'processors': 10, 'tmp_dir': './tmp', 'min_coverage': 20, 'max_evalue': 0.001, 'disable_pass2': False, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'no_cleanup': False, 'p2_identity': 80.0, 'p2_coverage': 80.0, 'p2_length': 80.0} 2024-01-11 21:39:23,444 -INFO- checking dependencies: 2024-01-11 21:39:23,456 -INFO- hmmer 3.4 OK 2024-01-11 21:39:23,516 -INFO- blastn 2.10.0+ OK 2024-01-11 21:39:23,516 -INFO- check database rexdb 2024-01-11 21:39:23,516 -INFO- db path: /home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database 2024-01-11 21:39:23,516 -INFO- db file: REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 2024-01-11 21:39:23,517 -INFO- REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm OK 2024-01-11 21:39:23,517 -INFO- Start classifying pipeline 2024-01-11 21:39:23,535 -INFO- total 437 sequences 2024-01-11 21:39:23,535 -INFO- translating GCA_genomic.fna.mod.RM.consensi.fa in six frames /home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/Bio/Seq.py:2338: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. BiopythonWarning, 2024-01-11 21:39:23,752 -INFO- HMM scanning against /home/dell/anaconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 2024-01-11 21:39:23,770 -INFO- Creating server instance (pp-1.6.4.4) 2024-01-11 21:39:23,770 -INFO- Running on Python 3.6.12 linux 2024-01-11 21:39:24,394 -INFO- pp local server started with 10 workers 2024-01-11 21:39:24,429 -INFO- Task 0 started 2024-01-11 21:39:24,430 -INFO- Task 1 started 2024-01-11 21:39:24,431 -INFO- Task 2 started 2024-01-11 21:39:24,431 -INFO- Task 3 started 2024-01-11 21:39:24,432 -INFO- Task 4 started 2024-01-11 21:39:24,432 -INFO- Task 5 started 2024-01-11 21:39:24,433 -INFO- Task 6 started 2024-01-11 21:39:24,433 -INFO- Task 7 started 2024-01-11 21:39:24,434 -INFO- Task 8 started 2024-01-11 21:39:24,435 -INFO- Task 9 started 2024-01-11 21:39:24,941 -INFO- generating gene anntations 2024-01-11 21:39:24,977 -INFO- 7 sequences classified by HMM 2024-01-11 21:39:24,977 -INFO- see protein domain sequences in GCA_021439955.1_ASM2143995v1_genomic.fna.mod.RM.consensi.fa.rexdb.dom.faa and annotation gff3 file in GCA_021439955.1_ASM2143995v1_genomic.fna.mod.RM.consensi.fa.rexdb.dom.gff3 2024-01-11 21:39:24,977 -INFO- classifying the unclassified sequences by searching against the classified ones 2024-01-11 21:39:24,986 -INFO- using the 80-80-80 rule 2024-01-11 21:39:24,986 -INFO- run CMD: makeblastdb -in ./tmp/pass1_classified.fa -dbtype nucl 2024-01-11 21:39:25,037 -INFO- run CMD: blastn -query ./tmp/pass1_unclassified.fa -db ./tmp/pass1_classified.fa -out ./tmp/pass1_unclassified.fa.blastout -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs qcovhsp sstrand' -num_threads 10 2024-01-11 21:39:25,162 -INFO- 1 sequences classified in pass 2 2024-01-11 21:39:25,163 -INFO- total 8 sequences classified. 2024-01-11 21:39:25,163 -INFO- see classified sequences in GCA_021439955.1_ASM2143995v1_genomic.fna.mod.RM.consensi.fa.rexdb.cls.tsv 2024-01-11 21:39:25,163 -INFO- writing library for RepeatMasker in GCA_021439955.1_ASM2143995v1_genomic.fna.mod.RM.consensi.fa.rexdb.cls.lib 2024-01-11 21:39:25,183 -INFO- writing classified protein domains in GCA_021439955.1_ASM2143995v1_genomic.fna.mod.RM.consensi.fa.rexdb.cls.pep 2024-01-11 21:39:25,184 -INFO- Summary of classifications: Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains LTR Gypsy 4 0 0 0 LINE unknown 1 0 0 0 TIR Tc1_Mariner 3 0 0 0 2024-01-11 21:39:25,184 -INFO- Pipeline done. 2024-01-11 21:39:25,184 -INFO- cleaning the temporary directory ./tmp ERROR: Intact TE annotation not found in GCA_genomic.fna.mod.EDTA.intact.gff3 at /home/dell/users/EDTA-master/EDTA.pl line 583. `

— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/417, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NE6OYO6HCSPQSXZ3N3YOP2ARAVCNFSM6AAAAABB2FRNQCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4DANZXGIYDCNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

oushujun commented 5 months ago

@Astesimal any luck?

Astesimal commented 5 months ago

Thanks for your advice! But the situation is not ideal, I think maybe there is a problem with the data quality

oushujun commented 5 months ago

Can you please provide more details?

Astesimal commented 5 months ago

Sorry for the late reply. In the folder ending in mod.EDTA.raw, TIR-Result has 0 files. Helitron also indicated that 0 bp was found, could this be the cause of the error?