Open michaelkyu opened 2 years ago
Hi Mike,
Regarding Prodigal and bbtools and the .h5 model files and a pfam file Pfam-A.TMP2.hmm and several sketch files, they need to be provided in the directory when the Docker image is built. It is mentioned in the README.md file, and it cites the Supp.info of the paper. That is why those weren't included in the dockerfile, since the .h5 model and pfam files would need to be downloaded anyway in case someone rebuilt the image. (these files are under https://portal.nersc.gov/dna/microbial/assembly/deeplasmid/DATA/TRAIN/MODELS/). I didn't think many people would re-build the Docker image, though I only tested it on my own GPU node, so it is expected other GPU users might encounter new issues.
Please see the Supplementary Information from the publication for things to consider when building the Docker image: Prodigal and bbtools/sketch need to be built, and the model .h5 files from training are needed, as well as several sketch files and Pfam-A.TMP2.hmm that can be downloaded (https://portal.nersc.gov/dna/microbial/assembly/deeplasmid/).
Regarding the directory error you are getting, it seems the feature extraction failed due to the hmm file missing. I wonder if you can login interactively to the deeplasmid-gpu container to check where those files are located in the container. Once you include these files it should work.
Could you please also post the exact command you used for running your new deeplasmid image? Thanks, Bill
Hi Bill,
Thanks for the quick response, and thanks for the explanation about Prodigal, etc!
This is the command I ran on my new deeplasmid image (which I called 'deeplasmid-gpu-build')
docker run -it --gpus all -v /mnt/data/pWCP_predictions/649989979.fna:/srv/jgi-ml/classifier/dl/in.fasta -v /mnt/data/pWCP_predictions/deeplasmid_predictions:/srv/jgi-ml/classifier/dl/outdir deeplasmid-gpu-build feature_DL_plasmid_predict.sh in.fasta outdir
Note that I substituted the parameters $(ls /dev/nvidia* | xargs -I{} echo '--device={}') $(ls /usr/lib/*-linux-gnu/{libcuda,libnvidia}*
for --gpus all
. I assumed --gpus all
was sufficient because I could successfully run the command docker run --rm --gpus all nvidia/cuda nvidia-smi
(see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html).
But in case this assumption is wrong, I also tried to follow your parameters (with a little modification to the base paths). I ran the command below, but it also gave a similar error as when I instead used --gpus all
.
docker run -it --rm --device=/dev/nvidia0 --device=/dev/nvidiactl --device=/dev/nvidia-modeset --device=/dev/nvidia-uvm --device=/dev/nvidia-uvm-tools -v /usr/lib/libcuda.so:/usr/lib/libcuda.so:ro -v /usr/lib/libcuda.so.1:/usr/lib/libcuda.so.1:ro -v /usr/lib/libcuda.so.450.80.02:/usr/lib/libcuda.so.450.80.02:ro -v /usr/lib/libnvidia-allocator.so:/usr/lib/libnvidia-allocator.so:ro -v /usr/lib/libnvidia-allocator.so.1:/usr/lib/libnvidia-allocator.so.1:ro -v /usr/lib/libnvidia-allocator.so.450.80.02:/usr/lib/libnvidia-allocator.so.450.80.02:ro -v /usr/lib/libnvidia-compiler.so.450.80.02:/usr/lib/libnvidia-compiler.so.450.80.02:ro -v /usr/lib/libnvidia-eglcore.so.450.80.02:/usr/lib/libnvidia-eglcore.so.450.80.02:ro -v /usr/lib/libnvidia-encode.so:/usr/lib/libnvidia-encode.so:ro -v /usr/lib/libnvidia-encode.so.1:/usr/lib/libnvidia-encode.so.1:ro -v /usr/lib/libnvidia-encode.so.450.80.02:/usr/lib/libnvidia-encode.so.450.80.02:ro -v /usr/lib/libnvidia-fbc.so:/usr/lib/libnvidia-fbc.so:ro -v /usr/lib/libnvidia-fbc.so.1:/usr/lib/libnvidia-fbc.so.1:ro -v /usr/lib/libnvidia-fbc.so.450.80.02:/usr/lib/libnvidia-fbc.so.450.80.02:ro -v /usr/lib/libnvidia-glcore.so.450.80.02:/usr/lib/libnvidia-glcore.so.450.80.02:ro -v /usr/lib/libnvidia-glsi.so.450.80.02:/usr/lib/libnvidia-glsi.so.450.80.02:ro -v /usr/lib/libnvidia-glvkspirv.so.450.80.02:/usr/lib/libnvidia-glvkspirv.so.450.80.02:ro -v /usr/lib/libnvidia-ifr.so:/usr/lib/libnvidia-ifr.so:ro -v /usr/lib/libnvidia-ifr.so.1:/usr/lib/libnvidia-ifr.so.1:ro -v /usr/lib/libnvidia-ifr.so.450.80.02:/usr/lib/libnvidia-ifr.so.450.80.02:ro -v /usr/lib/libnvidia-ml.so:/usr/lib/libnvidia-ml.so:ro -v /usr/lib/libnvidia-ml.so.1:/usr/lib/libnvidia-ml.so.1:ro -v /usr/lib/libnvidia-ml.so.450.80.02:/usr/lib/libnvidia-ml.so.450.80.02:ro -v /usr/lib/libnvidia-opencl.so.1:/usr/lib/libnvidia-opencl.so.1:ro -v /usr/lib/libnvidia-opencl.so.450.80.02:/usr/lib/libnvidia-opencl.so.450.80.02:ro -v /usr/lib/libnvidia-opticalflow.so:/usr/lib/libnvidia-opticalflow.so:ro -v /usr/lib/libnvidia-opticalflow.so.1:/usr/lib/libnvidia-opticalflow.so.1:ro -v /usr/lib/libnvidia-opticalflow.so.450.80.02:/usr/lib/libnvidia-opticalflow.so.450.80.02:ro -v /usr/lib/libnvidia-ptxjitcompiler.so:/usr/lib/libnvidia-ptxjitcompiler.so:ro -v /usr/lib/libnvidia-ptxjitcompiler.so.1:/usr/lib/libnvidia-ptxjitcompiler.so.1:ro -v /usr/lib/libnvidia-ptxjitcompiler.so.450.80.02:/usr/lib/libnvidia-ptxjitcompiler.so.450.80.02:ro -v /usr/lib/libnvidia-tls.so.450.80.02:/usr/lib/libnvidia-tls.so.450.80.02:ro -v /mnt/data/pWCP_predictions/649989979.fna:/srv/jgi-ml/classifier/dl/in.fasta -v /mnt/data/pWCP_predictions/deeplasmid_predictions:/srv/jgi-ml/classifier/dl/outdir deeplasmid-gpu-ubuntu-build feature_DL_plasmid_predict.sh in.fasta outdir
Best, Mike
Hello: I tried the deeplasmid-gpu image on my workstation, after updating and upgrading Ubuntu, and it seems some of the nvidia-cuda packages' paths have changed in the latest upgrade, which makes the container miss them and not work. I will need some time to troubleshoot deeplasmid-gpu and rebuild the image for the new nvidia-cuda paths.
In the meanwhile, I can confirm the deeplasmid image still works on CPU. I tested it on MacBook with Intel chip. It is half slower than the GPU image, but isn't a game-stopper if you have a few hundred contigs.
In case you have Ubuntu with Ryzen processors you can try the deeplasmid-cpu-ubuntu image, I provide on dockerhub, where I downgraded Keras since it has some incompatibilities with Ryzen processors (also mentioned in the README.md file). It gives the same results as the deeplasmid and deeplasmid-gpu images. The command I used on the testing file that is under the master branch:
testing/649989979$ sudo /usr/bin/docker run -it -v
pwd
/649989979.fna:/srv/jgi-ml/classifier/dl/in.fasta -v
pwd
/649989979.fna.OUT:/srv/jgi-ml/classifier/dl/outdir
billandreo/deeplasmid-cpu-ubuntu feature_DL_plasmid_predict.sh in.fasta
outdir
If you prefer I could also run your analysis locally and send you back the results, if you are willing to email me your contigs.
Thanks, Bill
On Sun, Jul 3, 2022 at 12:22 AM Mike Yu @.***> wrote:
Hi Bill,
Thanks for the quick response, and thanks for the explanation about Prodigal, etc!
This is the command I ran on my new deeplasmid image (which I called 'deeplasmid-gpu-build')
docker run -it --gpus all -v /mnt/data/pWCP_predictions/649989979.fna:/srv/jgi-ml/classifier/dl/in.fasta -v /mnt/data/pWCP_predictions/deeplasmid_predictions:/srv/jgi-ml/classifier/dl/outdir deeplasmid-gpu-build feature_DL_plasmid_predict.sh in.fasta outdir
Note that I substituted the parameters $(ls /dev/nvidia | xargs -I{} echo '--device={}') $(ls /usr/lib/-linux-gnu/{libcuda,libnvidia}* for --gpus all. I assumed --gpus all was sufficient because I could successfully run the command docker run --rm --gpus all nvidia/cuda nvidia-smi (see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html ).
But in case this assumption is wrong, I also tried to follow your parameters (with a little modification to the base paths). I ran the command below, but it also gave a similar error as when I instead used --gpus all.
docker run -it --rm --device=/dev/nvidia0 --device=/dev/nvidiactl --device=/dev/nvidia-modeset --device=/dev/nvidia-uvm --device=/dev/nvidia-uvm-tools -v /usr/lib/libcuda.so:/usr/lib/libcuda.so:ro -v /usr/lib/libcuda.so.1:/usr/lib/libcuda.so.1:ro -v /usr/lib/libcuda.so.450.80.02:/usr/lib/libcuda.so.450.80.02:ro -v /usr/lib/libnvidia-allocator.so:/usr/lib/libnvidia-allocator.so:ro -v /usr/lib/libnvidia-allocator.so.1:/usr/lib/libnvidia-allocator.so.1:ro -v /usr/lib/libnvidia-allocator.so.450.80.02:/usr/lib/libnvidia-allocator.so.450.80.02:ro -v /usr/lib/libnvidia-compiler.so.450.80.02:/usr/lib/libnvidia-compiler.so.450.80.02:ro -v /usr/lib/libnvidia-eglcore.so.450.80.02:/usr/lib/libnvidia-eglcore.so.450.80.02:ro -v /usr/lib/libnvidia-encode.so:/usr/lib/libnvidia-encode.so:ro -v /usr/lib/libnvidia-encode.so.1:/usr/lib/libnvidia-encode.so.1:ro -v /usr/lib/libnvidia-encode.so.450.80.02:/usr/lib/libnvidia-encode.so.450.80.02:ro -v /usr/lib/libnvidia-fbc.so:/usr/lib/libnvidia-fbc.so:ro -v /usr/lib/libnvidia-fbc.so.1:/usr/lib/libnvidia-fbc.so.1:ro -v /usr/lib/libnvidia-fbc.so.450.80.02:/usr/lib/libnvidia-fbc.so.450.80.02:ro -v /usr/lib/libnvidia-glcore.so.450.80.02:/usr/lib/libnvidia-glcore.so.450.80.02:ro -v /usr/lib/libnvidia-glsi.so.450.80.02:/usr/lib/libnvidia-glsi.so.450.80.02:ro -v /usr/lib/libnvidia-glvkspirv.so.450.80.02:/usr/lib/libnvidia-glvkspirv.so.450.80.02:ro -v /usr/lib/libnvidia-ifr.so:/usr/lib/libnvidia-ifr.so:ro -v /usr/lib/libnvidia-ifr.so.1:/usr/lib/libnvidia-ifr.so.1:ro -v /usr/lib/libnvidia-ifr.so.450.80.02:/usr/lib/libnvidia-ifr.so.450.80.02:ro -v /usr/lib/libnvidia-ml.so:/usr/lib/libnvidia-ml.so:ro -v /usr/lib/libnvidia-ml.so.1:/usr/lib/libnvidia-ml.so.1:ro -v /usr/lib/libnvidia-ml.so.450.80.02:/usr/lib/libnvidia-ml.so.450.80.02:ro -v /usr/lib/libnvidia-opencl.so.1:/usr/lib/libnvidia-opencl.so.1:ro -v /usr/lib/libnvidia-opencl.so.450.80.02:/usr/lib/libnvidia-opencl.so.450.80.02:ro -v /usr/lib/libnvidia-opticalflow.so:/usr/lib/libnvidia-opticalflow.so:ro -v /usr/lib/libnvidia-opticalflow.so.1:/usr/lib/libnvidia-opticalflow.so.1:ro -v /usr/lib/libnvidia-opticalflow.so.450.80.02:/usr/lib/libnvidia-opticalflow.so.450.80.02:ro -v /usr/lib/libnvidia-ptxjitcompiler.so:/usr/lib/libnvidia-ptxjitcompiler.so:ro -v /usr/lib/libnvidia-ptxjitcompiler.so.1:/usr/lib/libnvidia-ptxjitcompiler.so.1:ro -v /usr/lib/libnvidia-ptxjitcompiler.so.450.80.02:/usr/lib/libnvidia-ptxjitcompiler.so.450.80.02:ro -v /usr/lib/libnvidia-tls.so.450.80.02:/usr/lib/libnvidia-tls.so.450.80.02:ro -v /mnt/data/pWCP_predictions/649989979.fna:/srv/jgi-ml/classifier/dl/in.fasta -v /mnt/data/pWCP_predictions/deeplasmid_predictions:/srv/jgi-ml/classifier/dl/outdir deeplasmid-gpu-ubuntu-build feature_DL_plasmid_predict.sh in.fasta outdir
Best, Mike
— Reply to this email directly, view it on GitHub https://github.com/wandreopoulos/deeplasmid/issues/3#issuecomment-1173028053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANGW5LEO23G6NU536EZ7PLVSE5TLANCNFSM52OPVJYA . You are receiving this because you commented.Message ID: @.***>
-- Thanks, Bill
William B. Andreopoulos, Ph.D. Joint Genome Institute LBNL
Hello Michael @michaelkyu :
I just released a new Docker image for GPUs, that is built on tensorflow/tensorflow:latest-gpu. I left the CNTK stuff behind and moved to TensorFlow.
As described in the README, the usage is much simpler. For example: docker pull billandreo/deeplasmid.tf.gpu2 sudo /usr/bin/docker run -it -v /path/to/fasta:/srv/jgi-ml/classifier/dl/in.fasta -v /path/to/OUT/dir:/srv/jgi-ml/classifier/dl/outdir billandreo/deeplasmid.tf.gpu2 deeplasmid.sh in.fasta outdir
For rebuilding I provided a new Dockerfile.GPU2. Note it is still necessary to have the prodigal, bbtools/sketch and the .h5 and pfam files under the same dir (they can be downloaded from the nersc portal mentioned): sudo docker build -t billandreo/deeplasmid.tf.gpu2 -f Dockerfile.GPU2 .
The main bottleneck is the feature extraction, done before deep learning, which doesn't run on GPUs. I will have a couple of MS students soon who will hopefully improve the feature extraction by parallelizing it further (moving away from the py multiprocessing package), but no guarantees when it will be finished. This is the main reason why I've been running deeplasmid on microbial isolates, mostly, as for huge metagenomic datasets it still needs some work.
Thanks, Bill
I couldn't get your docker image billandreo/deeplasmid-gpu to run properly :( It does run, but it uses the CPU instead of GPU. For reference, I'm using a Nvidia Quadro RTX 8000, and I have no problem using tensorflow and pytorch in general with my machine. I assume it's a problem with the docker image (perhaps there's some weird incompatibility because of the machine you compiled it on vs. my machine).
So, I tried building the GPU docker image from scratch, by running the command
docker build -t billandreo/deeplasmid-gpu -f Dockerfile.GPU .
. However, I noticed errors related to prodigal and hmmer, and I realized the docker file is incomplete. To solve this, I added the following linesI also had to change
RUN apt install zlib1g-dev
toRUN apt install -y zlib1g-dev
, as the original line would ask a prompt that would automatically fail during the build.After making the above changes, the docker built an image successfully without errors.
Nonetheless, I'm still running into problems :( I ran deeplasmid on
649989979.fna
using the new image I built, but I get the following errorDo you have any thoughts on what's going on? It seems to be some issue in
format_predict.py