Closed cowriegump closed 5 months ago
Hello cowriegump, I'm Sky, one of the developer of EDTA. Could you share the command you executed, as well as all of the verisons of the software in the conda environment (you can retrieve this by doing so: enter your conda environment, execute this code: conda env export -f test.yml, after that copy and paste all the contents in test.yml here)? So that we can figure out where is the problem.
Please install EDTA with the yml file and test again, thanks!
Shujun
On Wed, Jan 17, 2024 at 2:28 AM cowriegump @.***> wrote:
Hi Shujun,
I have created a independent conda environment and installed EDTA v2.2.0 using mamba. But encountered some issues at the TIR-Learner step uisng the test fasta file, see the job running record below. I appreciate if you can help figure out where the problem is.
Bei
Parameters: -genome genome.fa -t 64 --force 1
Wed Jan 17 18:06:54 CST 2024 Dependency checking: All passed!
Wed Jan 17 18:07:00 CST 2024 Obtain raw TE libraries using various structure-based programs: Wed Jan 17 18:07:00 CST 2024 EDTA_raw: Check dependencies, prepare working directories.
Wed Jan 17 18:07:01 CST 2024 Start to find LTR candidates.
Wed Jan 17 18:07:01 CST 2024 Identify LTR retrotransposon candidates from scratch.
Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty. Wed Jan 17 18:07:39 CST 2024 Finish finding LTR candidates.
Wed Jan 17 18:07:39 CST 2024 Start to find SINE candidates.
Wed Jan 17 18:09:08 CST 2024 Warning: The SINE result file has 0 bp!
Wed Jan 17 18:09:08 CST 2024 Start to find LINE candidates.
Wed Jan 17 18:09:08 CST 2024 Identify LINE retrotransposon candidates from scratch.
cp: cannot stat 'genome.fa.mod.RM2.raw.fa': No such file or directory Wed Jan 17 18:09:09 CST 2024 Warning: The LINE result file has 0 bp!
Wed Jan 17 18:09:09 CST 2024 Start to find TIR candidates.
Wed Jan 17 18:09:09 CST 2024 Identify TIR candidates from scratch.
Species: others Traceback (most recent call last): File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/ TIR-Learner3.0.py", line 80, in TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length, File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 72, in init self.execute() File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 110, in execute self.execute_M4() File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 634, in execute_M4 self["base"] = CNN_predict.execute(self) File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/CNN_predict.py", line 108, in execute df = predict(df, TIRLearner_instance.genome_file_path, File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/CNN_predict.py", line 59, in predict model = load_model(path_to_model) File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/keras/src/saving/saving_api.py", line 262, in load_model return legacy_sm_saving_lib.load_model( File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/tensorflow/python/framework/function_def_to_graph.py", line 278, in function_def_to_graph_def input_shape = input_shape.as_proto() AttributeError: as_proto Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /public3/home/scg5735/edta/EDTA-2.2.0/util/ rename_tirlearner.pl line 19. Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.
Error: Error while loading sequence Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file. Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv Author: Shujun Ou @.***) 10/11/2019
mv: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory cp: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory cp: cannot stat 'genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory. ERROR: No such file or directory at /public3/home/scg5735/edta/EDTA-2.2.0/util/output_by_list.pl line 39. Warning: The TIR result file has 0 bp!
— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/418, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NB67C6F2BTXUQPATBTYO6RTBAVCNFSM6AAAAABB6JEBZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4DKOBYGA4DIOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Same error from the conda version, I will try with the yaml.Should I use https://github.com/oushujun/EDTA/blob/master/EDTA_2.2.x.yml
, right?
I cannot install the tensorflow from yaml file. log from mamba:
Encountered problems while solving:
- nothing provides __cuda needed by tensorflow-2.11.0-cuda112py39h01bd6f0_0
- nothing provides __cuda needed by tensorflow-base-2.11.0-cuda112py39h1c230a5_0
- nothing provides __cuda needed by tensorflow-base-2.11.0-cuda112py39h1c230a5_0
The install command I use, mamba version is 0.16.0
, installed by root
mamba env create -p ~/software/EDTA2 -f EDTA_2.2.x.ym
Try this two step approach:
mamba create -n EDTA2.2 -c bioconda -c conda-forge -c r perl cd-hit repeatmodeler muscle mdust openjdk perl-text-soundex multiprocess regex tensorflow keras scikit-learn biopython pandas glob2 h5py python tesorter genericrepeatfinder genometools-genometools ltr_retriever ltr_finder coreutils blast==2.10.1 swifter bedtools r-base r-ggplot2 r-dplyr r-tidyr r-here annosine2
mamba install 'h5py>3' -c bioconda -c conda-forge
@Juke34 any ideas why this is the case?
Where is the yaml or EDTA in these two steps?
Actually three steps. Need to pin keras and tensorflow for TIR-Learner3, pin h5py for tensorflow, pin blast for Repeatmasker. the EDTA conda and yml approach is seemingly not working currently:
mamba create -n EDTA2 -c bioconda -c conda-forge -c r perl cd-hit repeatmodeler muscle mdust openjdk perl-text-soundex multiprocess regex tensorflow keras scikit-learn biopython pandas glob2 h5py python tesorter genericrepeatfinder genometools-genometools ltr_retriever ltr_finder coreutils blast==2.10.1 swifter bedtools r-base r-ggplot2 r-dplyr r-tidyr r-here annosine2
mamba install 'keras=2.11' 'tensorflow=2.11' -c bioconda -c conda-forge
mamba install 'h5py>3' -c bioconda -c conda-forge
Sorry no idea why it is behaving like that, but it can vary a lot depending the machine you are working on. It is why conda/mamba is very limited. To be sure the conda recipe is correct we should only look at the container build from the recipe. At least this env will not change when used by different machine/user. So please run your analysis with docker or singularity using the last release: quay.io/biocontainers/edta:2.2.0--hdfd78af_1
If this does not work neither, we should definitely update the recipe.
If it works, we could still find a way to make a better conda recipe for people that want to use conda and avoid containers, but it is less dramatic.
Thanks Shujun and Sky, I installed mamba/miniforge on a new server node, created an new conda environment and used the yml file to install EDTA, it's working well now. No GPU is available on the server node, so the non-cuda version of tenrsorflow need to be installed by slightly modifying the yml file, that's trivial and straightforward, I think. Thanks for your support!
@oushujun Thanks for your command, but I tried the same approach with @cowriegump mentioned. I just changed the all tensorflow
start in the GitHub repo to a single line - tensorflow=2.11.0=cpu_py39h4655687_0
. Then this installation worked well and I could successfully run with Col-0 reference.
original:
- tensorflow=2.11.0=cuda112py39h01bd6f0_0
- tensorflow-base=2.11.0=cuda112py39h1c230a5_0
- tensorflow-estimator=2.11.0=cuda112py39hd320b7a_0
changed:
- tensorflow=2.11.0=cpu_py39h4655687_0
Please use this command to install all dependencies before we update the conda recipe:
mamba create -n EDTA2.2 -c conda-forge -c bioconda -c r annosine2 biopython blast cd-hit coreutils genericrepeatfinder genometools-genometools glob2 h5py==3.9 keras==2.11 ltr_finder ltr_retriever mdust multiprocess muscle openjdk pandas perl perl-text-soundex pyarrow python r-base r-dplyr regex repeatmodeler r-ggplot2 r-here r-tidyr scikit-learn swifter tensorflow==2.11 tesorter
Hi Shujun,
I have created a independent conda environment and installed EDTA v2.2.0 using mamba. But encountered some issues at the TIR-Learner step uisng the test fasta file, see the job running record below. I appreciate if you can help figure out where the problem is.
Bei
Parameters: -genome genome.fa -t 64 --force 1
Wed Jan 17 18:06:54 CST 2024 Dependency checking: All passed!
Wed Jan 17 18:07:00 CST 2024 Obtain raw TE libraries using various structure-based programs: Wed Jan 17 18:07:00 CST 2024 EDTA_raw: Check dependencies, prepare working directories.
Wed Jan 17 18:07:01 CST 2024 Start to find LTR candidates.
Wed Jan 17 18:07:01 CST 2024 Identify LTR retrotransposon candidates from scratch.
Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty. Wed Jan 17 18:07:39 CST 2024 Finish finding LTR candidates.
Wed Jan 17 18:07:39 CST 2024 Start to find SINE candidates.
Wed Jan 17 18:09:08 CST 2024 Warning: The SINE result file has 0 bp!
Wed Jan 17 18:09:08 CST 2024 Start to find LINE candidates.
Wed Jan 17 18:09:08 CST 2024 Identify LINE retrotransposon candidates from scratch.
cp: cannot stat 'genome.fa.mod.RM2.raw.fa': No such file or directory Wed Jan 17 18:09:09 CST 2024 Warning: The LINE result file has 0 bp!
Wed Jan 17 18:09:09 CST 2024 Start to find TIR candidates.
Wed Jan 17 18:09:09 CST 2024 Identify TIR candidates from scratch.
Species: others Traceback (most recent call last): File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in
TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length,
File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 72, in init
self.execute()
File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 110, in execute
self.execute_M4()
File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/main.py", line 634, in execute_M4
self["base"] = CNN_predict.execute(self)
File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/CNN_predict.py", line 108, in execute
df = predict(df, TIRLearner_instance.genome_file_path,
File "/public3/home/scg5735/edta/EDTA-2.2.0/bin/TIR-Learner3.0/bin/CNN_predict.py", line 59, in predict
model = load_model(path_to_model)
File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/keras/src/saving/saving_api.py", line 262, in load_model
return legacy_sm_saving_lib.load_model(
File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/public3/home/scg5735/mambaforge-pypy3/envs/edta2/lib/python3.10/site-packages/tensorflow/python/framework/function_def_to_graph.py", line 278, in function_def_to_graph_def
input_shape = input_shape.as_proto()
AttributeError: as_proto
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /public3/home/scg5735/edta/EDTA-2.2.0/util/rename_tirlearner.pl line 19.
Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.
Error: Error while loading sequence Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file. Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019
mv: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory cp: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory cp: cannot stat 'genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory. ERROR: No such file or directory at /public3/home/scg5735/edta/EDTA-2.2.0/util/output_by_list.pl line 39. Warning: The TIR result file has 0 bp!