Closed till-bornemann closed 3 years ago
Hi Till,
Thanks so much for the detailed issue. I will have a look into the database issue as well as in the cPickle version. Once I finish I will get back to you and also update the github README to reflect these potential issues.
Best regards, Joao
Hello developers,
I am trying to run PredicTF and am running into some issues. The installation (on Ubuntu 16.04.6 LTS, 40 cores, 1.5TB RAM, local installation, no GPUs on this system(and hence i dont want to do any model training)) ran without any problems according your instructions but when trying to use your script predictf_in_genome.sh to use your model on my genomes, i ran into the following issues:
1. The diamond version specified in this repository (installed via conda install -c bioconda diamond==0.9.24 ) is incompatible with the supplied databases used in the predictf_in_genome.sh wrapper script(Error: Database was built with a different version of Diamond and is incompatible). I checked that the specified version of diamond was used using the diamond version command. I can likely fix this by recompiling the features.fasta file in the same directory to the features.dmnd file (i assume from the same prefixes that this is the original .fasta file and not the deeparg.fasta file in the same folder). 2. quite a few of the PATHs do not work for me, e.g., diamond was searched for in '/deeparg//bin/diamond' which is likely never the correct directory (fixed by removing the path specifications in the deeparg.py file so that it could pick the version itself), one of the model files is looked for in '/home/till/03bioinformaticstools/102predictTF/PredicTF/BacTFDB/BacTFDB/model/v2/metadata_LS.pkl' with a duplicated 'BacTFDB' folder in the PATH (the correct PATH would only have a single instance of BacTFDB/ ; i can modify this one as well)
The command i ran was:
sh predictf_in_genome.sh ~/03bioinformaticstools/102predictTF/PredicTF ../../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.genes.faa ../../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11_out
and the console output i got was(i had run the same command previously, hence the mkdir warning; there seem to be additional calls to diamond also using the wrong PATH that i havent modified yet to the relative PATH; hence only the first db call of diamond works and the later one complains about the /deeparg//bin/diamond PATH):
mkdir: cannot create directory '../../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11_out': File exists path /home/till/anaconda3/envs/deeparg_env/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module. "downsample module has been moved to the theano.tensor.signal.pool module.") DIAMOND blastp alignment diamond blastp -q ../../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.genes.faa -d /home/till/03bioinformaticstools/102predictTF/PredicTF/BacTFDB//database/v2/features -k 1000 --id 30 --sensitive -e 1e-10 -a ../../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11_out/file.out.align diamond v0.9.24.125 | by Benjamin Buchfink <buchfink@gmail.com> Licensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt> Check http://github.com/bbuchfink/diamond for updates. #CPU threads: 40 Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: ../../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11_out Opening the database... [8.3e-05s] Error: Database was built with a different version of Diamond and is incompatible. parsing output file sh: 1: /deeparg//bin/diamond: not found Loading deep learning model ... Traceback (most recent call last): File "/home/till/03bioinformaticstools/102predictTF/PredicTF/deeparg-largerepo/deepARG.py", line 157, in <module> '.mapping', iden, mdl, evalue, prob, minCoverage, pipeline, version) File "/home/till/03bioinformaticstools/102predictTF/PredicTF/deeparg-largerepo/predict/bin/deepARG.py", line 149, in process open(path+"BacTFDB/model/"+version_m+"/metadata"+version+".pkl")) IOError: [Errno 2] No such file or directory: '/home/till/03bioinformaticstools/102predictTF/PredicTF/BacTFDB/BacTFDB/model/v2/metadata_LS.pkl'
If i remove the duplicated BacTFDB, i now get a cPickle error (using the same command; this one indicates to me that you may be using a different pickle version though cPickle was installed into the conda env and you thus at least indirectly install it with the installation commands):
mkdir: cannot create directory '../../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11_out': File exists path /home/till/anaconda3/envs/deeparg_env/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module. "downsample module has been moved to the theano.tensor.signal.pool module.") DIAMOND blastp alignment diamond blastp -q ../../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.genes.faa -d /home/till/03bioinformaticstools/102predictTF/PredicTF/BacTFDB//database/v2/features -k 1000 --id 30 --sensitive -e 1e-10 -a ../../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11_out/file.out.align diamond v0.9.24.125 | by Benjamin Buchfink <buchfink@gmail.com> Licensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt> Check http://github.com/bbuchfink/diamond for updates. #CPU threads: 40 Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: ../../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11_out Opening the database... [0.00017s] Error: Database was built with a different version of Diamond and is incompatible. parsing output file sh: 1: /deeparg//bin/diamond: not found Loading deep learning model ... Traceback (most recent call last): File "/home/till/03bioinformaticstools/102predictTF/PredicTF/deeparg-largerepo/deepARG.py", line 157, in <module> '.mapping', iden, mdl, evalue, prob, minCoverage, pipeline, version) File "/home/till/03bioinformaticstools/102predictTF/PredicTF/deeparg-largerepo/predict/bin/deepARG.py", line 149, in process open(path+"/model/"+version_m+"/metadata"+version+".pkl")) cPickle.UnpicklingError: invalid load key, 'v'.
Is there anything you can suggest to fix these errors?
Best regards, Till
PS: Maybe adding a little note on how to limit core/thread usage to the README would be nice as users might not want to allocate all the 40 threads for this (i can modify the deeparg.py to that effect but it might be a nice info to share).
Hi Till,
So a quick update to the issues you encountered. The Diamond path was indeed wrong. The solution I found for this was to ask users to modify the config.py file in the deeparg-largerepo folder. This resolved the complete issue of the database for me. I no longer found an incompatibility problem with the database version.
Unfortunately, the cPickle problem I am also getting now. I am currently still debugging this problem so please bear with me. Lastly, the threads assignment is set in the deeparg-largerepo as you clearly mention. If you don't mind sharing the modifications you made in the deepARG.py to this effect I would gladly incorporate it.
Best regards, Joao
Hi Till,
So the problem is fixed. A quick description of the issues and solutions. From what I saw you were using Anaconda 3 when Anaconda2 is required. Also, the paths to the BacTFDB were fixed. Lastly, the model files (.pkl) were not uploaded properly and only hashes were provided. I have created a download link to these files. Please have a look at the updated github repository and installation instructions. Should you encounter another issue please don't hesitate to contact us.
Best regards, Joao
Hi Joao,
Thanks for working on the problems (and sorry for my tardy reply)! I’ve reinstalled PredicTF according to your GitHub repo. I did not really want to install a second conda version (that defeats the purpose of anaconda/miniconda in my opinion) as a lot of junk would be downloaded that way and thus just made an environment in my anaconda3 - I dont think that should be a problem as you can specify which python version is to be default in the environment during setup. I did get some version conflicts between numpy and sklearn/scipy/pandas specified in the requirements.txt (only warnings, did not abort the installation) and got some numpy compilation errors when first executing your script as previously. Removing the numpy version ($python -m pip uninstall numpy) and reinstalling (python -m pip install numpy) removed the errors (a sklearn with scipy version incompatibility warning remained but did not result in issues).
One more error I encountered while running the below command: 1) For me at least, the BacTFDB path was not fixed, on two fronts. (And I recloned your repo and thus assume that im using the up to date repo structure). I still had to modify a few paths in the deeparg.py to make the below command executable and remove the error:
$ bash scripts/predictf_in_genome.sh ~/03bioinformaticstools/102predictTF/PredicTF ../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.genes.faa WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.out
mkdir: cannot create directory 'WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.out': File exists
path
/home/till/anaconda3/envs/deeparg_env/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
"downsample module has been moved to the theano.tensor.signal.pool module.")
DIAMOND blastp alignment
/home/till/03bioinformaticstools/102predictTF/PredicTF/deeparg-largerepo//bin/diamond blastp -q ../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.genes.faa -d /home/till/03bioinformaticstools/102predictTF/PredicTF/BacTFDB//database/v2/features -k 1000 --id 30 --sensitive -e 1e-10 -a WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.out/file.out.align
parsing output file
Loading deep learning model ...
Traceback (most recent call last):
File "/home/till/03bioinformaticstools/102predictTF/PredicTF/deeparg-largerepo/deepARG.py", line 157, in
-> fixed by removing one instance of /BacTFDB from the PATH wherever this PATH was called. I also later removed the /model/ from the PATH as the current GitHub repo does not have a ./BacTFDB/model/v2/ (instead it has only ./BacTFDB/v2/ ) and I had thus automatically put the downloaded files into the ./v2 subdirectory. I guess the Cython (.pyc) file next to deeparg.py isnt used as I didn’t have to recompile the .pyc file?
But now your script is working for me. Thanks for your help!
Best, Till
University Duisburg-Essen Environmental Microbiology and Biotechnology Group for Aquatic Microbial Ecology (GAME) Till Bornemann +49 17681529205 Universitaetsstrasse 5 45141 Essen Germany Office: S05 T02 B16
On 22. Nov 2021, at 13:05, mdsufz @.**@.>> wrote:
Hi Till,
So the problem is fixed. A quick description of the issues and solutions. From what I saw you were using Anaconda 3 when Anaconda2 is required. Also, the paths to the BacTFDB were fixed. Lastly, the model files (.pkl) were not uploaded properly and only hashes were provided. I have created a download link to these files. Please have a look at the updated github repository and installation instructions. Should you encounter another issue please don't hesitate to contact us.
Best regards, Joao
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mdsufz/PredicTF/issues/1#issuecomment-975452907, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALNKY2C73445QWFUK6V7NG3UNIWYVANCNFSM5IEHD4PA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi Till,
Glad you got it to work. I have to check the github again apparently. I thought I had edited the scripts to remove those pesky problems of the BacTFDB. One of the warnings is about the existence of the folder.
Thanks again for the info.
Best regards, Joao
On 22/11/21 18:14, till-bornemann @.***> wrote:
Hi Joao,
Thanks for working on the problems (and sorry for my tardy reply)! I’ve reinstalled PredicTF according to your GitHub repo. I did not really want to install a second conda version (that defeats the purpose of anaconda/miniconda in my opinion) as a lot of junk would be downloaded that way and thus just made an environment in my anaconda3 - I dont think that should be a problem as you can specify which python version is to be default in the environment during setup. I did get some version conflicts between numpy and sklearn/scipy/pandas specified in the requirements.txt (only warnings, did not abort the installation) and got some numpy compilation errors when first executing your script as previously. Removing the numpy version ($python -m pip uninstall numpy) and reinstalling (python -m pip install numpy) removed the errors (a sklearn with scipy version incompatibility warning remained but did not result in issues).
One more error I encountered while running the below command: 1) For me at least, the BacTFDB path was not fixed, on two fronts. (And I recloned your repo and thus assume that im using the up to date repo structure). I still had to modify a few paths in the deeparg.py to make the below command executable and remove the error:
$ bash scripts/predictf_in_genome.sh ~/03bioinformaticstools/102predictTF/PredicTF ../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.genes.faa WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.out
mkdir: cannot create directory 'WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.out': File exists path /home/till/anaconda3/envs/deeparg_env/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module. "downsample module has been moved to the theano.tensor.signal.pool module.") DIAMOND blastp alignment /home/till/03bioinformaticstools/102predictTF/PredicTF/deeparg-largerepo//bin/diamond blastp -q ../genome/WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.genes.faa -d /home/till/03bioinformaticstools/102predictTF/PredicTF/BacTFDB//database/v2/features -k 1000 --id 30 --sensitive -e 1e-10 -a WB01no11_MG_WBTS_GAME_concat_Anaerolineae_51_11.out/file.out.align parsing output file Loading deep learning model ... Traceback (most recent call last): File "/home/till/03bioinformaticstools/102predictTF/PredicTF/deeparg-largerepo/deepARG.py", line 157, in
'.mapping', iden, mdl, evalue, prob, minCoverage, pipeline, version) File "/home/till/03bioinformaticstools/102predictTF/PredicTF/deeparg-largerepo/predict/bin/deepARG.py", line 149, in process open(path+"BacTFDB/model/"+version_m+"/metadata"+version+".pkl")) IOError: [Errno 2] No such file or directory: '/home/till/03bioinformaticstools/102predictTF/PredicTF/BacTFDB/BacTFDB/model/v2/metadata_LS.pkl' -> fixed by removing one instance of /BacTFDB from the PATH wherever this PATH was called. I also later removed the /model/ from the PATH as the current GitHub repo does not have a ./BacTFDB/model/v2/ (instead it has only ./BacTFDB/v2/ ) and I had thus automatically put the downloaded files into the ./v2 subdirectory. I guess the Cython (.pyc) file next to deeparg.py isnt used as I didn’t have to recompile the .pyc file?
But now your script is working for me. Thanks for your help!
Best, Till
University Duisburg-Essen Environmental Microbiology and Biotechnology Group for Aquatic Microbial Ecology (GAME) Till Bornemann +49 17681529205 Universitaetsstrasse 5 45141 Essen Germany Office: S05 T02 B16
On 22. Nov 2021, at 13:05, mdsufz @.**@.>> wrote:
Hi Till,
So the problem is fixed. A quick description of the issues and solutions. From what I saw you were using Anaconda 3 when Anaconda2 is required. Also, the paths to the BacTFDB were fixed. Lastly, the model files (.pkl) were not uploaded properly and only hashes were provided. I have created a download link to these files. Please have a look at the updated github repository and installation instructions. Should you encounter another issue please don't hesitate to contact us.
Best regards, Joao
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mdsufz/PredicTF/issues/1#issuecomment-975452907, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALNKY2C73445QWFUK6V7NG3UNIWYVANCNFSM5IEHD4PA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub(https://github.com/mdsufz/PredicTF/issues/1#issuecomment-975744396 ), or unsubscribe(https://github.com/notifications/unsubscribe-auth/AMXWJGL2YRVFX724ZV4IDXTUNJ3AJANCNFSM5IEHD4PA ). Triage notifications on the go with GitHub Mobile for iOS(https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 ) or Android(https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub ).
--
Best regards,
Joao Saraiva
Department of Environmental Microbiology Group of Microbial Systems Bioinformatics Helmholtz Centre for Environmental Research - UFZ Permoserstraße 15, 04318 Leipzig, Germany Phone: +49 341 235 - 1374 Email: @.*** WWW: http://www.ufz.de
Hello developers,
I am trying to run PredicTF and am running into some issues. The installation (on Ubuntu 16.04.6 LTS, 40 cores, 1.5TB RAM, local installation, no GPUs on this system(and hence i dont want to do any model training)) ran without any problems according your instructions but when trying to use your script predictf_in_genome.sh to use your model on my genomes, i ran into the following issues: 1) The diamond version specified in this repository (installed via conda install -c bioconda diamond==0.9.24 ) is incompatible with the supplied databases used in the predictf_in_genome.sh wrapper script(Error: Database was built with a different version of Diamond and is incompatible). I checked that the specified version of diamond was used using the diamond version command. I can likely fix this by recompiling the features.fasta file in the same directory to the features.dmnd file (i assume from the same prefixes that this is the original .fasta file and not the deeparg.fasta file in the same folder). 2) quite a few of the PATHs do not work for me, e.g., diamond was searched for in '/deeparg//bin/diamond' which is likely never the correct directory (fixed by removing the path specifications in the deeparg.py file so that it could pick the version itself), one of the model files is looked for in '/home/till/03bioinformaticstools/102predictTF/PredicTF/BacTFDB/BacTFDB/model/v2/metadata_LS.pkl' with a duplicated 'BacTFDB' folder in the PATH (the correct PATH would only have a single instance of BacTFDB/ ; i can modify this one as well)
The command i ran was:
and the console output i got was(i had run the same command previously, hence the mkdir warning; there seem to be additional calls to diamond also using the wrong PATH that i havent modified yet to the relative PATH; hence only the first db call of diamond works and the later one complains about the /deeparg//bin/diamond PATH):
If i remove the duplicated BacTFDB, i now get a cPickle error (using the same command; this one indicates to me that you may be using a different pickle version though cPickle was installed into the conda env and you thus at least indirectly install it with the installation commands):
Is there anything you can suggest to fix these errors?
Best regards, Till
PS: Maybe adding a little note on how to limit core/thread usage to the README would be nice as users might not want to allocate all the 40 threads for this (i can modify the deeparg.py to that effect but it might be a nice info to share).