oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

TIR not found? #451

Closed CongLiu37 closed 1 month ago

CongLiu37 commented 3 months ago

Hello,

I am running EDTA_raw.pl for TIR identification. The comand looks like this:

EDTA_raw.pl --genome ${genome} \
        --species others \
        --type tir \
        --overwrite 1 \
        --threads 15

I have two genomes of same genus giving error:

Wed 20 Mar 03:37:44 JST 2024    EDTA_raw: Check dependencies, prepare working directories.

Wed 20 Mar 03:37:51 JST 2024    Start to find TIR candidates.

Wed 20 Mar 03:37:51 JST 2024    Identify TIR candidates from scratch.

Species: others
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /bucket/.mabuya/BourguignonU/Cong/Softwares/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list Aban.genome.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
    Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
    Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019

mv: cannot stat 'Aban.genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory
cp: cannot stat 'Aban.genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'Aban.genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!

Thu 21 Mar 01:28:44 JST 2024    Execution of EDTA_raw.pl is finished!

The error from another genome:

Wed 20 Mar 07:03:51 JST 2024    EDTA_raw: Check dependencies, prepare working directories.

Wed 20 Mar 07:03:55 JST 2024    Start to find TIR candidates.

Wed 20 Mar 07:03:55 JST 2024    Identify TIR candidates from scratch.

Species: others
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /bucket/.mabuya/BourguignonU/Cong/Softwares/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list Apac.genome.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
    Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
    Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019

mv: cannot stat 'Apac.genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory
cp: cannot stat 'Apac.genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'Apac.genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!

Thu 21 Mar 02:17:51 JST 2024    Execution of EDTA_raw.pl is finished!

Does it mean that TIR-learner just failed to find any TIR in the genomes, or there is something wrong in EDTA (say, memory, parameters etc.)?

Sincerely,

Cong

CongLiu37 commented 3 months ago

For a third genome, I got an error which looks like insufficient memory:

Thu 21 Mar 01:29:07 JST 2024    EDTA_raw: Check dependencies, prepare working directories.

Thu 21 Mar 01:29:11 JST 2024    Start to find TIR candidates.

Thu 21 Mar 01:29:11 JST 2024    Identify TIR candidates from scratch.

Species: others
Traceback (most recent call last):
  File "/bucket/.mabuya/BourguignonU/Cong/Softwares/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in <module>
    TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length,
  File "/bucket/.mabuya/BourguignonU/Cong/Softwares/EDTA/bin/TIR-Learner3.0/bin/main.py", line 72, in __init__
    self.execute()
  File "/bucket/.mabuya/BourguignonU/Cong/Softwares/EDTA/bin/TIR-Learner3.0/bin/main.py", line 110, in execute
    self.execute_M4()
  File "/bucket/.mabuya/BourguignonU/Cong/Softwares/EDTA/bin/TIR-Learner3.0/bin/main.py", line 634, in execute_M4
    self["base"] = CNN_predict.execute(self)
  File "/bucket/.mabuya/BourguignonU/Cong/Softwares/EDTA/bin/TIR-Learner3.0/bin/CNN_predict.py", line 108, in execute
    df = predict(df, TIRLearner_instance.genome_file_path,
  File "/bucket/.mabuya/BourguignonU/Cong/Softwares/EDTA/bin/TIR-Learner3.0/bin/CNN_predict.py", line 75, in predict
    pre_feature_tensor = tf.convert_to_tensor(np.stack(pre_feature), np.float32)
  File "/bucket/BourguignonU/Cong/Softwares/mamba/lib/python3.10/site-packages/numpy/core/shape_base.py", line 456, in stack
    return _nx.concatenate(expanded_arrays, axis=axis, out=out,
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 215. GiB for an array with shape (28869774, 400, 5) and data type float32
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /bucket/.mabuya/BourguignonU/Cong/Softwares/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list Lunk.genome.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
    Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
    Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019

mv: cannot stat 'Lunk.genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory
cp: cannot stat 'Lunk.genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'Lunk.genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!

Fri 22 Mar 12:25:43 JST 2024    Execution of EDTA_raw.pl is finished!

However, I have already reached the memory limit of HPC. Do you think reducing some threads is helpful?

Sincerely,

Cong