oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

TIR-Learner does not run properly in EDTA v2.2.0 #418

Closed cowriegump closed 5 months ago

cowriegump commented 5 months ago

Hi Shujun,

I have created a independent conda environment and installed EDTA v2.2.0 using mamba. But encountered some issues at the TIR-Learner step uisng the test fasta file, see the job running record below. I appreciate if you can help figure out where the problem is.

Bei

Parameters: -genome genome.fa -t 64 --force 1

Wed Jan 17 18:06:54 CST 2024 Dependency checking: All passed!

Wed Jan 17 18:07:00 CST 2024 Obtain raw TE libraries using various structure-based programs: Wed Jan 17 18:07:00 CST 2024 EDTA_raw: Check dependencies, prepare working directories.

Wed Jan 17 18:07:01 CST 2024 Start to find LTR candidates.

Wed Jan 17 18:07:01 CST 2024 Identify LTR retrotransposon candidates from scratch.

Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty. Wed Jan 17 18:07:39 CST 2024 Finish finding LTR candidates.

Wed Jan 17 18:07:39 CST 2024 Start to find SINE candidates.

Wed Jan 17 18:09:08 CST 2024 Warning: The SINE result file has 0 bp!

Wed Jan 17 18:09:08 CST 2024 Start to find LINE candidates.

Wed Jan 17 18:09:08 CST 2024 Identify LINE retrotransposon candidates from scratch.

cp: cannot stat 'genome.fa.mod.RM2.raw.fa': No such file or directory Wed Jan 17 18:09:09 CST 2024 Warning: The LINE result file has 0 bp!

Wed Jan 17 18:09:09 CST 2024 Start to find TIR candidates.

Wed Jan 17 18:09:09 CST 2024 Identify TIR candidates from scratch.

Species: others Traceback (most recent call last): File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length, File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 72, in init self.execute() File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 110, in execute self.execute_M4() File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 634, in execute_M4 self["base"] = CNN_predict.execute(self) File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/CNN_predict.py", line 108, in execute df = predict(df, TIRLearner_instance.genome_file_path, File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/CNN_predict.py", line 59, in predict model = load_model(path_to_model) File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/keras/src/saving/saving_api.py", line 262, in load_model return legacy_sm_saving_lib.load_model( File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/tensorflow/python/framework/function_def_to_graph.py", line 278, in function_def_to_graph_def input_shape = input_shape.as_proto() AttributeError: as_proto Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /public3/home/scg5735/edta/EDTA-2.2.0/util/rename_tirlearner.pl line 19. Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file. Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019

mv: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory cp: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory cp: cannot stat 'genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory. ERROR: No such file or directory at /public3/home/scg5735/edta/EDTA-2.2.0/util/output_by_list.pl line 39. Warning: The TIR result file has 0 bp!

lutianyu2001 commented 5 months ago

Hello cowriegump, I'm Sky, one of the developer of EDTA. Could you share the command you executed, as well as all of the verisons of the software in the conda environment (you can retrieve this by doing so: enter your conda environment, execute this code: conda env export -f test.yml, after that copy and paste all the contents in test.yml here)? So that we can figure out where is the problem.

oushujun commented 5 months ago

Please install EDTA with the yml file and test again, thanks!

Shujun

On Wed, Jan 17, 2024 at 2:28 AM cowriegump @.***> wrote:

Hi Shujun,

I have created a independent conda environment and installed EDTA v2.2.0 using mamba. But encountered some issues at the TIR-Learner step uisng the test fasta file, see the job running record below. I appreciate if you can help figure out where the problem is.

Bei

Parameters: -genome genome.fa -t 64 --force 1

Wed Jan 17 18:06:54 CST 2024 Dependency checking: All passed!

Wed Jan 17 18:07:00 CST 2024 Obtain raw TE libraries using various structure-based programs: Wed Jan 17 18:07:00 CST 2024 EDTA_raw: Check dependencies, prepare working directories.

Wed Jan 17 18:07:01 CST 2024 Start to find LTR candidates.

Wed Jan 17 18:07:01 CST 2024 Identify LTR retrotransposon candidates from scratch.

Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty. Wed Jan 17 18:07:39 CST 2024 Finish finding LTR candidates.

Wed Jan 17 18:07:39 CST 2024 Start to find SINE candidates.

Wed Jan 17 18:09:08 CST 2024 Warning: The SINE result file has 0 bp!

Wed Jan 17 18:09:08 CST 2024 Start to find LINE candidates.

Wed Jan 17 18:09:08 CST 2024 Identify LINE retrotransposon candidates from scratch.

cp: cannot stat 'genome.fa.mod.RM2.raw.fa': No such file or directory Wed Jan 17 18:09:09 CST 2024 Warning: The LINE result file has 0 bp!

Wed Jan 17 18:09:09 CST 2024 Start to find TIR candidates.

Wed Jan 17 18:09:09 CST 2024 Identify TIR candidates from scratch.

Species: others Traceback (most recent call last): File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/ TIR-Learner3.0.py", line 80, in TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length, File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 72, in init self.execute() File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 110, in execute self.execute_M4() File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 634, in execute_M4 self["base"] = CNN_predict.execute(self) File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/CNN_predict.py", line 108, in execute df = predict(df, TIRLearner_instance.genome_file_path, File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/CNN_predict.py", line 59, in predict model = load_model(path_to_model) File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/keras/src/saving/saving_api.py", line 262, in load_model return legacy_sm_saving_lib.load_model( File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/tensorflow/python/framework/function_def_to_graph.py", line 278, in function_def_to_graph_def input_shape = input_shape.as_proto() AttributeError: as_proto Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /public3/home/scg5735/edta/EDTA-2.2.0/util/ rename_tirlearner.pl line 19. Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file. Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv Author: Shujun Ou @.***) 10/11/2019

mv: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory cp: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory cp: cannot stat 'genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory. ERROR: No such file or directory at /public3/home/scg5735/edta/EDTA-2.2.0/util/output_by_list.pl line 39. Warning: The TIR result file has 0 bp!

— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/418, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NB67C6F2BTXUQPATBTYO6RTBAVCNFSM6AAAAABB6JEBZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4DKOBYGA4DIOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

baozg commented 5 months ago

Same error from the conda version, I will try with the yaml.Should I use https://github.com/oushujun/EDTA/blob/master/EDTA_2.2.x.yml, right?

baozg commented 5 months ago

I cannot install the tensorflow from yaml file. log from mamba:

Encountered problems while solving:
  - nothing provides __cuda needed by tensorflow-2.11.0-cuda112py39h01bd6f0_0
  - nothing provides __cuda needed by tensorflow-base-2.11.0-cuda112py39h1c230a5_0
  - nothing provides __cuda needed by tensorflow-base-2.11.0-cuda112py39h1c230a5_0

The install command I use, mamba version is 0.16.0, installed by root


mamba env create -p ~/software/EDTA2 -f EDTA_2.2.x.ym
oushujun commented 5 months ago

Try this two step approach:

mamba create -n EDTA2.2 -c bioconda -c conda-forge -c r perl cd-hit repeatmodeler muscle mdust openjdk perl-text-soundex multiprocess regex tensorflow keras scikit-learn biopython pandas glob2 h5py python tesorter genericrepeatfinder genometools-genometools ltr_retriever ltr_finder coreutils blast==2.10.1 swifter bedtools r-base r-ggplot2 r-dplyr r-tidyr r-here annosine2

mamba install 'h5py>3' -c bioconda -c conda-forge

@Juke34 any ideas why this is the case?

baozg commented 5 months ago

Where is the yaml or EDTA in these two steps?

oushujun commented 5 months ago

Actually three steps. Need to pin keras and tensorflow for TIR-Learner3, pin h5py for tensorflow, pin blast for Repeatmasker. the EDTA conda and yml approach is seemingly not working currently:

mamba create -n EDTA2 -c bioconda -c conda-forge -c r perl cd-hit repeatmodeler muscle mdust openjdk perl-text-soundex multiprocess regex tensorflow keras scikit-learn biopython pandas glob2 h5py python tesorter genericrepeatfinder genometools-genometools ltr_retriever ltr_finder coreutils blast==2.10.1 swifter bedtools r-base r-ggplot2 r-dplyr r-tidyr r-here annosine2

mamba install 'keras=2.11' 'tensorflow=2.11' -c bioconda -c conda-forge

mamba install 'h5py>3' -c bioconda -c conda-forge

Juke34 commented 5 months ago

Sorry no idea why it is behaving like that, but it can vary a lot depending the machine you are working on. It is why conda/mamba is very limited. To be sure the conda recipe is correct we should only look at the container build from the recipe. At least this env will not change when used by different machine/user. So please run your analysis with docker or singularity using the last release: quay.io/biocontainers/edta:2.2.0--hdfd78af_1 If this does not work neither, we should definitely update the recipe. If it works, we could still find a way to make a better conda recipe for people that want to use conda and avoid containers, but it is less dramatic.

cowriegump commented 5 months ago

Thanks Shujun and Sky, I installed mamba/miniforge on a new server node, created an new conda environment and used the yml file to install EDTA, it's working well now. No GPU is available on the server node, so the non-cuda version of tenrsorflow need to be installed by slightly modifying the yml file, that's trivial and straightforward, I think. Thanks for your support!

baozg commented 5 months ago

@oushujun Thanks for your command, but I tried the same approach with @cowriegump mentioned. I just changed the all tensorflow start in the GitHub repo to a single line - tensorflow=2.11.0=cpu_py39h4655687_0. Then this installation worked well and I could successfully run with Col-0 reference.

original:

  - tensorflow=2.11.0=cuda112py39h01bd6f0_0
  - tensorflow-base=2.11.0=cuda112py39h1c230a5_0
  - tensorflow-estimator=2.11.0=cuda112py39hd320b7a_0

changed:

- tensorflow=2.11.0=cpu_py39h4655687_0
oushujun commented 5 months ago

Please use this command to install all dependencies before we update the conda recipe: mamba create -n EDTA2.2 -c conda-forge -c bioconda -c r annosine2 biopython blast cd-hit coreutils genericrepeatfinder genometools-genometools glob2 h5py==3.9 keras==2.11 ltr_finder ltr_retriever mdust multiprocess muscle openjdk pandas perl perl-text-soundex pyarrow python r-base r-dplyr regex repeatmodeler r-ggplot2 r-here r-tidyr scikit-learn swifter tensorflow==2.11 tesorter