oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
331 stars 72 forks source link

No space left on device #481

Open ycgong opened 1 month ago

ycgong commented 1 month ago

Dear Dr. Ou,

Thanks for the EDTA software. I have installed EDTA via conda and the test run was fine. However with a plant genome, it got some errors about low disk space, perhaps at the stage of TIR:

#########################################################

Extensive de-novo TE Annotator (EDTA) v2.2.0
Shujun Ou (shujun.ou.1@gmail.com)

#########################################################

Parameters: --genome roth_hap1_split_rename.fa --step all --anno 1 -t 50 --sensitive 1

Thu Jul 11 02:56:55 PM EDT 2024 Dependency checking: All passed!

Thu Jul 11 03:23:51 PM EDT 2024 Obtain raw TE libraries using various structure-based programs: Thu Jul 11 03:23:51 PM EDT 2024 EDTA_raw: Check dependencies, prepare working directories.

Thu Jul 11 03:33:19 PM EDT 2024 Start to find LTR candidates.

Thu Jul 11 03:33:19 PM EDT 2024 Identify LTR retrotransposon candidates from scratch.

Sat Jul 13 04:24:03 AM EDT 2024 Finish finding LTR candidates.

Sat Jul 13 04:24:03 AM EDT 2024 Start to find SINE candidates.

Sat Jul 13 01:36:03 PM EDT 2024 Finish finding SINE candidates.

Sat Jul 13 01:36:03 PM EDT 2024 Start to find LINE candidates.

Sat Jul 13 01:36:03 PM EDT 2024 Identify LINE retrotransposon candidates from scratch.

Sun Jul 14 03:48:16 PM EDT 2024 Finish finding LINE candidates.

Sun Jul 14 03:48:16 PM EDT 2024 Start to find TIR candidates.

Sun Jul 14 03:48:16 PM EDT 2024 Identify TIR candidates from scratch.

Species: others cannot write to stream: No space left on device /ohta1/apps/miniconda3/envs/EDTA/bin/gt tirvish: error: fopen(): cannot open file 'TIR-Learner-+-gt_index.prj': No such file or directory Traceback (most recent call last): File "/apps/miniconda3/envs/EDTA/share/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length, File "/apps/miniconda3/envs/EDTA/share/EDTA/bin/TIR-Learner3.0/bin/main.py", line 72, in init self.execute() File "/apps/miniconda3/envs/EDTA/share/EDTA/bin/TIR-Learner3.0/bin/main.py", line 110, in execute self.execute_M4() File "/apps/miniconda3/envs/EDTA/share/EDTA/bin/TIR-Learner3.0/bin/main.py", line 593, in execute_M4 self["TIRvish"] = run_TIRvish.execute(self) File "/apps/miniconda3/envs/EDTA/share/EDTA/bin/TIR-Learner3.0/bin/run_TIRvish.py", line 76, in execute return get_fasta_pieces_SeqIO(genome_file, df, cpu_cores, flag_verbose) File "/apps/miniconda3/envs/EDTA/share/EDTA/bin/TIR-Learner3.0/bin/get_fasta_sequence.py", line 65, in get_fasta_pieces_SeqIO return pd.concat(df_with_seq_list).sort_index() File "/apps/miniconda3/envs/EDTA/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 372, in concat op = _Concatenator( File "/apps/miniconda3/envs/EDTA/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 452, in init raise ValueError("All objects passed were None") ValueError: All objects passed were None Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /apps/miniconda3/envs/EDTA/share/EDTA/util/rename_tirlearner.pl line 19. Warning: LOC list roth_hap1_split_rename.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file. Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019

mv: cannot stat 'roth_hap1_split_rename.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory cp: cannot stat 'roth_hap1_split_rename.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory cp: cannot stat 'roth_hap1_split_rename.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory. ERROR: No such file or directory at /apps/miniconda3/envs/EDTA/share/EDTA/util/output_by_list.pl line 39. Warning: The TIR result file has 0 bp!

Sun Jul 14 06:30:24 PM EDT 2024 Start to find Helitron candidates.

Sun Jul 14 06:30:24 PM EDT 2024 Identify Helitron candidates from scratch.

###########################################################

The disk space should be fine with 11TB free, and I couldn't figure out what else might the cause. Please help.

Thanks, yg

tallnuttrbgv commented 1 month ago

I have the same issue - conda yaml install. I have plenty of space on the disk. I did not get the masked output or full gffs. Only individual TE type gffs - I guess because TIR failed and it could not find the output to concatenate.

Genome is not large (0.9 Gbp). It also took 2.5 days to run with 24 cpus.. which seems slow.

perl /media/storage/bin/EDTA/EDTA.pl --genome themeda_scaf_norpts.fasta --species Maize --cds cds_from_genomic.fna --anno 1 --threads 24

Wed Jul 10 05:52:23 UTC 2024 EDTA_raw: Check dependencies, prepare working directories.

Wed Jul 10 05:52:25 UTC 2024 Start to find LTR candidates.

Wed Jul 10 05:52:25 UTC 2024 Identify LTR retrotransposon candidates from scratch.

Wed Jul 10 09:49:54 UTC 2024 Finish finding LTR candidates.

Wed Jul 10 09:49:54 UTC 2024 Start to find SINE candidates.

Wed Jul 10 11:11:10 UTC 2024 Finish finding SINE candidates.

Wed Jul 10 11:11:10 UTC 2024 Start to find LINE candidates.

Wed Jul 10 11:11:10 UTC 2024 Identify LINE retrotransposon candidates from scratch.

Use of uninitialized value in string ne at /media/storage/bin/EDTA/util/cleanup_misclas.pl line 61, line 47. Use of uninitialized value within %lib in string ne at /media/storage/bin/EDTA/util/cleanup_misclas.pl line 61, line 47. Use of uninitialized value in string ne at /media/storage/bin/EDTA/util/cleanup_misclas.pl line 61, line 174. Use of uninitialized value within %lib in string ne at /media/storage/bin/EDTA/util/cleanup_misclas.pl line 61, line 174. Fri Jul 12 04:53:54 UTC 2024 Finish finding LINE candidates.

Fri Jul 12 04:53:54 UTC 2024 Start to find TIR candidates.

Fri Jul 12 04:53:54 UTC 2024 Identify TIR candidates from scratch.

Species: Maize cannot write to stream: No space left on device /media/storage/bin/miniforge3/envs/EDTA2/bin/gt tirvish: error: fopen(): cannot open file 'TIR-Learner-+-gt_index.prj': No such file or directory Traceback (most recent call last): File "/media/storage/bin/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length, File "/media/storage/bin/EDTA/bin/TIR-Learner3.0/bin/main.py", line 81, in init self.execute() File "/media/storage/bin/EDTA/bin/TIR-Learner3.0/bin/main.py", line 121, in execute self.execute_M4() File "/media/storage/bin/EDTA/bin/TIR-Learner3.0/bin/main.py", line 631, in execute_M4 self["TIRvish"] = run_TIRvish.execute(self) File "/media/storage/bin/EDTA/bin/TIR-Learner3.0/bin/run_TIRvish.py", line 79, in execute return get_fasta_pieces_SeqIO(genome_file, df, cpu_cores, flag_verbose) File "/media/storage/bin/EDTA/bin/TIR-Learner3.0/bin/get_fasta_sequence.py", line 63, in get_fasta_pieces_SeqIO return pd.concat(df_with_seq_list).sort_index() File "/media/storage/bin/miniforge3/envs/EDTA2/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 380, in concat op = _Concatenator( File "/media/storage/bin/miniforge3/envs/EDTA2/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 443, in init objs, keys = self._clean_keys_and_objs(objs, keys) File "/media/storage/bin/miniforge3/envs/EDTA2/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 539, in _clean_keys_and_objs raise ValueError("All objects passed were None") ValueError: All objects passed were None Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /media/storage/bin/EDTA/util/rename_tirlearner.pl line 19. Warning: LOC list themeda_scaf_norpts.fasta.mod.TIR.ext30.list is empty.

Error: Error while loading sequence Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file. Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019

mv: cannot stat 'themeda_scaf_norpts.fasta.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory cp: cannot stat 'themeda_scaf_norpts.fasta.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory cp: cannot stat 'themeda_scaf_norpts.fasta.mod.TIR.intact.raw.fa.anno.list': No such file or directory Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory. ERROR: No such file or directory at /media/storage/bin/EDTA/util/output_by_list.pl line 39. Warning: The TIR result file has 0 bp!

Fri Jul 12 05:29:49 UTC 2024 Start to find Helitron candidates.

Fri Jul 12 05:29:50 UTC 2024 Identify Helitron candidates from scratch.

Fri Jul 12 13:06:11 UTC 2024 Finish finding Helitron candidates.

Fri Jul 12 13:06:11 UTC 2024 Execution of EDTA_raw.pl is finished!

ERROR: Raw TIR results not found in themeda_scaf_norpts.fasta.mod.EDTA.raw/themeda_scaf_norpts.fasta.mod.TIR.intact.raw.fa If you believe the program is working properly, this may be caused by the lack of intact TIRs in your genome. Consider to use the --force 1 parameter to overwrite this check

Wildwiner commented 1 month ago

Hello,I have the same issue - conda yaml install. I have plenty of space on the disk. These are the only three outputs:TD.fa.mod.EDTA.raw TD.fa.mod.RM2.raw.fa TD.fa.mod But I test the testing,it did not occor an error #########################################################

Extensive de-novo TE Annotator (EDTA) v2.2.1
Shujun Ou (shujun.ou.1@gmail.com)

#########################################################

Parameters: --genome ../run/TD.fa --cds ../run/Td_cds_gff3 --species others --overwrite 1 --sensitive 1 --anno 1 --threads 20

2024年 07月 20日 星期六 10:58:59 CST Dependency checking: All passed!

2024年 07月 20日 星期六 10:59:04 CST The longest sequence ID in the genome contains 34 characters, which is longer than the limit (13) Trying to reformat seq IDs... Attempt 1... 2024年 07月 20日 星期六 10:59:04 CST Seq ID conversion successful!

A CDS file ../run/Td_cds_gff3 is provided via --cds. Please make sure this is the DNA sequence of coding regions only.

2024年 07月 20日 星期六 10:59:04 CST Obtain raw TE libraries using various structure-based programs: 2024年 07月 20日 星期六 10:59:04 CST EDTA_raw: Check dependencies, prepare working directories.

2024年 07月 20日 星期六 10:59:06 CST Start to find LTR candidates.

2024年 07月 20日 星期六 10:59:06 CST Identify LTR retrotransposon candidates from scratch.

2024年 07月 20日 星期六 11:09:34 CST Finish finding LTR candidates.

2024年 07月 20日 星期六 11:09:34 CST Start to find SINE candidates.

2024年 07月 20日 星期六 11:35:11 CST Finish finding SINE candidates.

2024年 07月 20日 星期六 11:35:11 CST Start to find LINE candidates.

2024年 07月 20日 星期六 11:35:11 CST Identify LINE retrotransposon candidates from scratch.

2024年 07月 20日 星期六 17:29:41 CST Finish finding LINE candidates.

2024年 07月 20日 星期六 17:29:43 CST Start to find TIR candidates.

2024年 07月 20日 星期六 17:29:52 CST Identify TIR candidates from scratch.

Species: others cannot write to stream: No space left on device /home/2106040113_zhangwentao/miniconda3/envs/EDTA2.2/bin/gt tirvish: error: fopen(): cannot open file 'TIR-Learner-+-gt_index.prj': No such file or directory Traceback (most recent call last): File "/home/2106040113_zhangwentao/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length, File "/home/2106040113_zhangwentao/EDTA/bin/TIR-Learner3.0/bin/main.py", line 81, in init self.execute() File "/home/2106040113_zhangwentao/EDTA/bin/TIR-Learner3.0/bin/main.py", line 121, in execute self.execute_M4() File "/home/2106040113_zhangwentao/EDTA/bin/TIR-Learner3.0/bin/main.py", line 631, in execute_M4 self["TIRvish"] = run_TIRvish.execute(self) File "/home/2106040113_zhangwentao/EDTA/bin/TIR-Learner3.0/bin/run_TIRvish.py", line 79, in execute return get_fasta_pieces_SeqIO(genome_file, df, cpu_cores, flag_verbose) File "/home/2106040113_zhangwentao/EDTA/bin/TIR-Learner3.0/bin/get_fasta_sequence.py", line 63, in get_fasta_pieces_SeqIO return pd.concat(df_with_seq_list).sort_index() File "/home/2106040113_zhangwentao/miniconda3/envs/EDTA2.2/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 382, in concat op = _Concatenator( File "/home/2106040113_zhangwentao/miniconda3/envs/EDTA2.2/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 445, in init objs, keys = self._clean_keys_and_objs(objs, keys) File "/home/2106040113_zhangwentao/miniconda3/envs/EDTA2.2/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 541, in _clean_keys_and_objs raise ValueError("All objects passed were None") ValueError: All objects passed were None Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: 没有那个文件或目录 at /home/2106040113_zhangwentao/EDTA/util/rename_tirlearner.pl line 19. Warning: LOC list TD.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file. Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019

mv: 对 'TD.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln' 调用 stat 失败: 没有那个文件或目录 cp: 对 'TD.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list' 调用 stat 失败: 没有那个文件或目录 cp: 对 'TD.fa.mod.TIR.intact.raw.fa.anno.list' 调用 stat 失败: 没有那个文件或目录 Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: 没有那个文件或目录. ERROR: No such file or directory at /home/2106040113_zhangwentao/EDTA/util/output_by_list.pl line 39. Warning: The TIR result file has 0 bp!

2024年 07月 20日 星期六 17:40:24 CST Start to find Helitron candidates.

2024年 07月 20日 星期六 17:40:24 CST Identify Helitron candidates from scratch.

2024年 07月 20日 星期六 18:47:13 CST Finish finding Helitron candidates.

2024年 07月 20日 星期六 18:47:13 CST Execution of EDTA_raw.pl is finished!

ERROR: Raw TIR results not found in TD.fa.mod.EDTA.raw/TD.fa.mod.TIR.intact.raw.fa If you believe the program is working properly, this may be caused by the lack of intact TIRs in your genome. Consider to use the --force 1 parameter to overwrite this check Thanks, ww

oushujun commented 4 weeks ago

Hello,

The conda recipe has not yet been updated, so please use the following command to install dependencies: For EDTA: mamba create -n EDTA -c conda-forge -c bioconda -c r annosine2 biopython blast cd-hit coreutils genericrepeatfinder genometools-genometools glob2 h5py==3.9 keras==2.11 ltr_finder ltr_retriever mdust multiprocess muscle openjdk pandas perl perl-text-soundex pyarrow python r-base r-dplyr regex repeatmodeler r-ggplot2 r-here r-tidyr scikit-learn swifter tensorflow==2.11 tesorter

For EDTA2 mamba create -n EDTA2.2 -c conda-forge -c bioconda -c r annosine2 biopython cd-hit coreutils genericrepeatfinder genometools-genometools glob2 tir-learner ltr_finder_parallel ltr_retriever mdust multiprocess muscle openjdk perl perl-text-soundex r-base r-dplyr regex repeatmodeler r-ggplot2 r-here r-tidyr tesorter samtools bedtools LTR_HARVEST_parallel HelitronScanner

THanks! Shujun