oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
331 stars 72 forks source link

singularity ERRO #485

Open jwli-code opened 1 month ago

jwli-code commented 1 month ago

`singularity exec ../EDTA.sif EDTA.pl --genome genome.fa --cds genome.cds.fa --curatedlib ../database/rice7.0.0.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --threads 5

#########################################################

Extensive de-novo TE Annotator (EDTA) v2.2.0
Shujun Ou (shujun.ou.1@gmail.com)

#########################################################

Parameters: --genome genome.fa --cds genome.cds.fa --curatedlib ../database/rice7.0.0.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --threads 5

Mon Jul 22 11:02:06 CST 2024 Dependency checking: All passed!

    A custom library ../database/rice7.0.0.liban is provided via --curatedlib. Please make sure this is a manually curated library but not machine generated.

    A CDS file genome.cds.fa is provided via --cds. Please make sure this is the DNA sequence of coding regions only.

    A BED file is provided via --exclude. Regions specified by this file will be excluded from TE annotation and masking.

Mon Jul 22 11:02:20 CST 2024 Obtain raw TE libraries using various structure-based programs: Mon Jul 22 11:02:20 CST 2024 EDTA_raw: Check dependencies, prepare working directories.

Mon Jul 22 11:02:23 CST 2024 Start to find LTR candidates.

Mon Jul 22 11:02:23 CST 2024 Identify LTR retrotransposon candidates from scratch.

Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty. Mon Jul 22 11:04:38 CST 2024 Finish finding LTR candidates.

Mon Jul 22 11:04:38 CST 2024 Start to find SINE candidates.

Mon Jul 22 11:07:34 CST 2024 Warning: The SINE result file has 0 bp!

Mon Jul 22 11:07:34 CST 2024 Start to find LINE candidates.

Mon Jul 22 11:07:34 CST 2024 Identify LINE retrotransposon candidates from scratch.

cp: cannot stat 'genome.fa.mod.RM2.raw.fa': No such file or directory Mon Jul 22 11:11:10 CST 2024 Warning: The LINE result file has 0 bp!

Mon Jul 22 11:11:10 CST 2024 Start to find TIR candidates.

Mon Jul 22 11:11:10 CST 2024 Identify TIR candidates from scratch.

Species: others /usr/local/lib/python3.10/site-packages/dask/dataframe/_pyarrow_compat.py:17: FutureWarning: Minimal version of pyarrow will soon be increased to 14.0.1. You are using 13.0.0. Please consider upgrading. warnings.warn( Traceback (most recent call last): File "/usr/local/share/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length, File "/usr/local/share/EDTA/bin/TIR-Learner3.0/bin/main.py", line 72, in init self.execute() File "/usr/local/share/EDTA/bin/TIR-Learner3.0/bin/main.py", line 110, in execute self.execute_M4() File "/usr/local/share/EDTA/bin/TIR-Learner3.0/bin/main.py", line 634, in execute_M4 self["base"] = CNN_predict.execute(self) File "/usr/local/share/EDTA/bin/TIR-Learner3.0/bin/CNN_predict.py", line 108, in execute df = predict(df, TIRLearner_instance.genome_file_path, File "/usr/local/share/EDTA/bin/TIR-Learner3.0/bin/CNN_predict.py", line 59, in predict model = load_model(path_to_model) File "/usr/local/lib/python3.10/site-packages/keras/src/saving/saving_api.py", line 262, in load_model return legacy_sm_saving_lib.load_model( File "/usr/local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "/usr/local/lib/python3.10/site-packages/tensorflow/python/framework/function_def_to_graph.py", line 278, in function_def_to_graph_def input_shape = input_shape.as_proto() AttributeError: as_proto Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /usr/local/share/EDTA/util/rename_tirlearner.pl line 19. Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file. Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019

mv: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory cp: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory cp: cannot stat 'genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory. ERROR: No such file or directory at /usr/local/share/EDTA/util/output_by_list.pl line 39. Warning: The TIR result file has 0 bp!

Mon Jul 22 11:15:13 CST 2024 Start to find Helitron candidates.

Mon Jul 22 11:15:13 CST 2024 Identify Helitron candidates from scratch.

Mon Jul 22 11:16:27 CST 2024 Finish finding Helitron candidates.

Mon Jul 22 11:16:27 CST 2024 Execution of EDTA_raw.pl is finished!

ERROR: Raw TIR results not found in genome.fa.mod.EDTA.raw/genome.fa.mod.TIR.intact.raw.fa If you believe the program is working properly, this may be caused by the lack of intact TIRs in your genome. Consider to use the --force 1 parameter to overwrite this check `

qjiangzhao commented 1 month ago

I got the same error.

FayeFang17 commented 3 weeks ago

Hi,

currently it is better to install the released version (master branch) by mamba create -n EDTA -c conda-forge -c bioconda -c r annosine2 biopython blast cd-hit coreutils genericrepeatfinder genometools-genometools glob2 h5py==3.9 keras==2.11 ltr_finder ltr_retriever mdust multiprocess muscle openjdk pandas perl perl-text-soundex pyarrow python r-base r-dplyr regex repeatmodeler r-ggplot2 r-here r-tidyr scikit-learn swifter tensorflow==2.11 tesorter

Or you could try out the EDTA2 branch(under active development): mamba create -n EDTA2.2 -c conda-forge -c bioconda -c r annosine2 biopython cd-hit coreutils genericrepeatfinder genometools-genometools glob2 tir-learner ltr_finder_parallel ltr_retriever mdust multiprocess muscle openjdk perl perl-text-soundex r-base r-dplyr regex repeatmodeler r-ggplot2 r-here r-tidyr tesorter samtools bedtools LTR_HARVEST_parallel HelitronScanner Then just git clone and git checkout EDTA2.

For this branch, it's better to install fresh because we updated the TIR-Learner recipe recently, and may be in conflict with previous dependencies. Please let me know if you counter further problems!

Faye

qizhengyang2017 commented 3 weeks ago

Hello, I installed the released version using the command

micromamba create -n EDTA -c conda-forge -c bioconda -c r annosine2 biopython blast cd-hit coreutils genericrepeatfinder genometools-genometools glob2 h5py==3.9 keras==2.11 ltr_finder ltr_retriever mdust multiprocess muscle openjdk pandas perl perl-text-soundex pyarrow python r-base r-dplyr regex repeatmodeler r-ggplot2 r-here r-tidyr scikit-learn swifter tensorflow==2.11 tesorter

but it still failed.


#########################################################
##### Extensive de-novo TE Annotator (EDTA) v2.2.1  #####
##### Shujun Ou (shujun.ou.1@gmail.com)             #####
#########################################################

Parameters: --genome genome.fa --cds genome.cds.fa --curatedlib ../database/rice7.0.0.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --threads 10

Fri Aug 16 21:24:40 CST 2024    Dependency checking:
        All passed!

        A custom library ../database/rice7.0.0.liban is provided via --curatedlib. Please make sure this is a manually curated library but not machine generated.

        A CDS file genome.cds.fa is provided via --cds. Please make sure this is the DNA sequence of coding regions only.

        A BED file is provided via --exclude. Regions specified by this file will be excluded from TE annotation and masking.

Fri Aug 16 21:24:42 CST 2024    Obtain raw TE libraries using various structure-based programs:
Fri Aug 16 21:24:42 CST 2024    EDTA_raw: Check dependencies, prepare working directories.

Fri Aug 16 21:24:45 CST 2024    Start to find LTR candidates.

Fri Aug 16 21:24:45 CST 2024    Identify LTR retrotransposon candidates from scratch.

Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty.
Fri Aug 16 21:25:22 CST 2024    Finish finding LTR candidates.

Fri Aug 16 21:25:22 CST 2024    Start to find SINE candidates.

Fri Aug 16 21:26:33 CST 2024    Warning: The SINE result file has 0 bp!

Fri Aug 16 21:26:33 CST 2024    Start to find LINE candidates.

Fri Aug 16 21:26:33 CST 2024    Identify LINE retrotransposon candidates from scratch.

Fri Aug 16 21:28:46 CST 2024    Warning: The LINE result file has 0 bp!

Fri Aug 16 21:28:46 CST 2024    Start to find TIR candidates.

Fri Aug 16 21:28:46 CST 2024    Identify TIR candidates from scratch.

Species: others
Traceback (most recent call last):
  File "/home/zhengqingyou/GraffiTE/GraffiTE/melon/EDTA/EDTA/bin/TIR-Learner3.0/bin/main.py", line 260, in load_checkpoint_file
    execution_progress_info = json.loads(checkpoint_info_file.readline().rstrip())
  File "/home/zhengqingyou/micromamba/envs/EDTA2.2/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/zhengqingyou/micromamba/envs/EDTA2.2/lib/python3.10/json/decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 2 (char 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zhengqingyou/GraffiTE/GraffiTE/melon/EDTA/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in <module>
    TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length,
  File "/home/zhengqingyou/GraffiTE/GraffiTE/melon/EDTA/EDTA/bin/TIR-Learner3.0/bin/main.py", line 81, in __init__
    self.execute()
  File "/home/zhengqingyou/GraffiTE/GraffiTE/melon/EDTA/EDTA/bin/TIR-Learner3.0/bin/main.py", line 111, in execute
    self.load_checkpoint_file()
  File "/home/zhengqingyou/GraffiTE/GraffiTE/melon/EDTA/EDTA/bin/TIR-Learner3.0/bin/main.py", line 311, in load_checkpoint_file
    self.reset_checkpoint_load_state(f"WARN: Checkpoint file invalid, \"{df_file_name}.csv\" is empty. "
UnboundLocalError: local variable 'df_file_name' referenced before assignment
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /home/zhengqingyou/GraffiTE/GraffiTE/melon/EDTA/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
        Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
        Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019

mv: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!

Fri Aug 16 21:28:54 CST 2024    Start to find Helitron candidates.

Fri Aug 16 21:28:54 CST 2024    Identify Helitron candidates from scratch.

Fri Aug 16 21:29:46 CST 2024    Finish finding Helitron candidates.

Fri Aug 16 21:29:46 CST 2024    Execution of EDTA_raw.pl is finished!

ERROR: Raw TIR results not found in genome.fa.mod.EDTA.raw/genome.fa.mod.TIR.intact.raw.fa
        If you believe the program is working properly, this may be caused by the lack of intact TIRs in your genome. Consider to use the --force 1 parameter to overwrite this check