wandreopoulos / deeplasmid

17 stars 2 forks source link

Incomplete Dockerfile.GPU #3

Open michaelkyu opened 2 years ago

michaelkyu commented 2 years ago

I couldn't get your docker image billandreo/deeplasmid-gpu to run properly :( It does run, but it uses the CPU instead of GPU. For reference, I'm using a Nvidia Quadro RTX 8000, and I have no problem using tensorflow and pytorch in general with my machine. I assume it's a problem with the docker image (perhaps there's some weird incompatibility because of the machine you compiled it on vs. my machine).

So, I tried building the GPU docker image from scratch, by running the command docker build -t billandreo/deeplasmid-gpu -f Dockerfile.GPU .. However, I noticed errors related to prodigal and hmmer, and I realized the docker file is incomplete. To solve this, I added the following lines

RUN git clone https://github.com/hyattpd/Prodigal.git
RUN wget http://eddylab.org/software/hmmer/hmmer-3.3.2.tar.gz
RUN tar -zxf hmmer-3.3.2.tar.gz

I also had to change RUN apt install zlib1g-dev to RUN apt install -y zlib1g-dev, as the original line would ask a prompt that would automatically fail during the build.

After making the above changes, the docker built an image successfully without errors.

Nonetheless, I'm still running into problems :( I ran deeplasmid on 649989979.fna using the new image I built, but I get the following error

Deeplasmid - Plasmid finder for microbial genome assemblies.
Running feature_DL_plasmid_predict.sh .
This .sh script is meant for running Deeplasmid from the Docker image.
Usage: please specify 2 arguments - the input fasta file and output directory - as follows:
 docker run -it  -v /path/to/input/fasta:/srv/jgi-ml/classifier/dl/in.fasta  -v  /path/to/output/directory:/srv/jgi-ml/classifier/dl/outdir   billandreo/deeplasmid     feature_DL_plasmid_predict.sh  in.fasta outdir
Contact person: Bill Andreopoulos, wandreopoulos@lbl.gov
Last maintained: January 12, 2020
Using 20220702_045700 for outdir suffix
('oneHot base, size=', 15, ', sample:')
('base:', 'A', '1-hot:', [1.0, 0.0, 0.0, 0.0])
('base:', 'C', '1-hot:', [0.0, 1.0, 0.0, 0.0])
('base:', 'T', '1-hot:', [0.0, 0.0, 1.0, 0.0])
('base:', 'G', '1-hot:', [0.0, 0.0, 0.0, 1.0])
('all bases :', ['A', 'C', 'B', 'D', 'G', 'H', 'K', 'M', 'N', 'S', 'R', 'T', 'W', 'V', 'Y'])
('use seqLenCut=', 300)
Cannot find outdir/dlFeatures.20220702_045700, creating as new
cpu_count() = 255

Creating pool with 16 processes
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
Job finished with success.
TOTAL_RUNTIME: 21.8768889904
exiting........
Using CNTK backend
Selected CPU as the process wide default device.
/usr/local/lib/python3.5/dist-packages/keras/backend/cntk_backend.py:21: UserWarning: CNTK backend warning: GPU is not detected. CNTK's CPU version is not fully optimized,please run with GPU to get better performance.
  'CNTK backend warning: GPU is not detected. '
oneHot base, size= 15 , sample:
base: A 1-hot: [1.0, 0.0, 0.0, 0.0]
base: C 1-hot: [0.0, 1.0, 0.0, 0.0]
base: T 1-hot: [0.0, 0.0, 1.0, 0.0]
base: G 1-hot: [0.0, 0.0, 0.0, 1.0]
all bases : ['D', 'B', 'T', 'R', 'G', 'H', 'W', 'A', 'C', 'N', 'V', 'S', 'Y', 'M', 'K']
use seqLenCut= 300
deep-libs1 imported elaT=0.0 sec
myArg: outPath outdir/outPR.20220702_045700
myArg: inputyml outdir/dlFeatures.20220702_045700/yml
myArg: verb 1
myArg: noXterm True
myArg: dataPath outdir/dlDataFormattedPred.20220702_045700
myArg: inputfasta in.fasta
myArg: kfoldOffset 0
myArg: arrIdx 1
disable Xterm
Plotter_Plasmid : Graphics started
DL_Model , prj: assayer4
globFeatureL 17 fixed order: ['gc_content', 'len_sequence', 'plassketch', 'plasORIsketch', 'chromsketch', 'genecount', 'genesperMB', 'aalenavg', 'pfam_vector', 'A_longestHomopol', 'A_totalLongHomopol', 'C_longestHomopol', 'C_totalLongHomopol', 'T_longestHomopol', 'T_totalLongHomopol', 'G_longestHomopol', 'G_totalLongHomopol']
Traceback (most recent call last):
  File "/srv/jgi-ml/classifier/dl/format_predict.py", line 72, in <module>
    plasmGFD=get_glob_info_files(plasmGFDir)
  File "/srv/jgi-ml/classifier/dl/Util_Plasmid.py", line 62, in get_glob_info_files
    allL=os.listdir(dir0)
FileNotFoundError: [Errno 2] No such file or directory: 'outdir/dlFeatures.20220702_045700/yml'

Do you have any thoughts on what's going on? It seems to be some issue in format_predict.py

wandreopoulos commented 2 years ago

Hi Mike,

Regarding Prodigal and bbtools and the .h5 model files and a pfam file Pfam-A.TMP2.hmm and several sketch files, they need to be provided in the directory when the Docker image is built. It is mentioned in the README.md file, and it cites the Supp.info of the paper. That is why those weren't included in the dockerfile, since the .h5 model and pfam files would need to be downloaded anyway in case someone rebuilt the image. (these files are under https://portal.nersc.gov/dna/microbial/assembly/deeplasmid/DATA/TRAIN/MODELS/). I didn't think many people would re-build the Docker image, though I only tested it on my own GPU node, so it is expected other GPU users might encounter new issues.

Please see the Supplementary Information from the publication for things to consider when building the Docker image: Prodigal and bbtools/sketch need to be built, and the model .h5 files from training are needed, as well as several sketch files and Pfam-A.TMP2.hmm that can be downloaded (https://portal.nersc.gov/dna/microbial/assembly/deeplasmid/).

Regarding the directory error you are getting, it seems the feature extraction failed due to the hmm file missing. I wonder if you can login interactively to the deeplasmid-gpu container to check where those files are located in the container. Once you include these files it should work.

Could you please also post the exact command you used for running your new deeplasmid image? Thanks, Bill

michaelkyu commented 2 years ago

Hi Bill,

Thanks for the quick response, and thanks for the explanation about Prodigal, etc!

This is the command I ran on my new deeplasmid image (which I called 'deeplasmid-gpu-build')

docker run -it --gpus all -v /mnt/data/pWCP_predictions/649989979.fna:/srv/jgi-ml/classifier/dl/in.fasta -v /mnt/data/pWCP_predictions/deeplasmid_predictions:/srv/jgi-ml/classifier/dl/outdir deeplasmid-gpu-build feature_DL_plasmid_predict.sh in.fasta outdir

Note that I substituted the parameters $(ls /dev/nvidia* | xargs -I{} echo '--device={}') $(ls /usr/lib/*-linux-gnu/{libcuda,libnvidia}* for --gpus all. I assumed --gpus all was sufficient because I could successfully run the command docker run --rm --gpus all nvidia/cuda nvidia-smi (see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html).

But in case this assumption is wrong, I also tried to follow your parameters (with a little modification to the base paths). I ran the command below, but it also gave a similar error as when I instead used --gpus all.

docker run -it --rm --device=/dev/nvidia0 --device=/dev/nvidiactl --device=/dev/nvidia-modeset --device=/dev/nvidia-uvm --device=/dev/nvidia-uvm-tools -v /usr/lib/libcuda.so:/usr/lib/libcuda.so:ro -v /usr/lib/libcuda.so.1:/usr/lib/libcuda.so.1:ro -v /usr/lib/libcuda.so.450.80.02:/usr/lib/libcuda.so.450.80.02:ro -v /usr/lib/libnvidia-allocator.so:/usr/lib/libnvidia-allocator.so:ro -v /usr/lib/libnvidia-allocator.so.1:/usr/lib/libnvidia-allocator.so.1:ro -v /usr/lib/libnvidia-allocator.so.450.80.02:/usr/lib/libnvidia-allocator.so.450.80.02:ro -v /usr/lib/libnvidia-compiler.so.450.80.02:/usr/lib/libnvidia-compiler.so.450.80.02:ro -v /usr/lib/libnvidia-eglcore.so.450.80.02:/usr/lib/libnvidia-eglcore.so.450.80.02:ro -v /usr/lib/libnvidia-encode.so:/usr/lib/libnvidia-encode.so:ro -v /usr/lib/libnvidia-encode.so.1:/usr/lib/libnvidia-encode.so.1:ro -v /usr/lib/libnvidia-encode.so.450.80.02:/usr/lib/libnvidia-encode.so.450.80.02:ro -v /usr/lib/libnvidia-fbc.so:/usr/lib/libnvidia-fbc.so:ro -v /usr/lib/libnvidia-fbc.so.1:/usr/lib/libnvidia-fbc.so.1:ro -v /usr/lib/libnvidia-fbc.so.450.80.02:/usr/lib/libnvidia-fbc.so.450.80.02:ro -v /usr/lib/libnvidia-glcore.so.450.80.02:/usr/lib/libnvidia-glcore.so.450.80.02:ro -v /usr/lib/libnvidia-glsi.so.450.80.02:/usr/lib/libnvidia-glsi.so.450.80.02:ro -v /usr/lib/libnvidia-glvkspirv.so.450.80.02:/usr/lib/libnvidia-glvkspirv.so.450.80.02:ro -v /usr/lib/libnvidia-ifr.so:/usr/lib/libnvidia-ifr.so:ro -v /usr/lib/libnvidia-ifr.so.1:/usr/lib/libnvidia-ifr.so.1:ro -v /usr/lib/libnvidia-ifr.so.450.80.02:/usr/lib/libnvidia-ifr.so.450.80.02:ro -v /usr/lib/libnvidia-ml.so:/usr/lib/libnvidia-ml.so:ro -v /usr/lib/libnvidia-ml.so.1:/usr/lib/libnvidia-ml.so.1:ro -v /usr/lib/libnvidia-ml.so.450.80.02:/usr/lib/libnvidia-ml.so.450.80.02:ro -v /usr/lib/libnvidia-opencl.so.1:/usr/lib/libnvidia-opencl.so.1:ro -v /usr/lib/libnvidia-opencl.so.450.80.02:/usr/lib/libnvidia-opencl.so.450.80.02:ro -v /usr/lib/libnvidia-opticalflow.so:/usr/lib/libnvidia-opticalflow.so:ro -v /usr/lib/libnvidia-opticalflow.so.1:/usr/lib/libnvidia-opticalflow.so.1:ro -v /usr/lib/libnvidia-opticalflow.so.450.80.02:/usr/lib/libnvidia-opticalflow.so.450.80.02:ro -v /usr/lib/libnvidia-ptxjitcompiler.so:/usr/lib/libnvidia-ptxjitcompiler.so:ro -v /usr/lib/libnvidia-ptxjitcompiler.so.1:/usr/lib/libnvidia-ptxjitcompiler.so.1:ro -v /usr/lib/libnvidia-ptxjitcompiler.so.450.80.02:/usr/lib/libnvidia-ptxjitcompiler.so.450.80.02:ro -v /usr/lib/libnvidia-tls.so.450.80.02:/usr/lib/libnvidia-tls.so.450.80.02:ro -v /mnt/data/pWCP_predictions/649989979.fna:/srv/jgi-ml/classifier/dl/in.fasta -v /mnt/data/pWCP_predictions/deeplasmid_predictions:/srv/jgi-ml/classifier/dl/outdir deeplasmid-gpu-ubuntu-build feature_DL_plasmid_predict.sh in.fasta outdir

Best, Mike

wandreopoulos commented 2 years ago

Hello: I tried the deeplasmid-gpu image on my workstation, after updating and upgrading Ubuntu, and it seems some of the nvidia-cuda packages' paths have changed in the latest upgrade, which makes the container miss them and not work. I will need some time to troubleshoot deeplasmid-gpu and rebuild the image for the new nvidia-cuda paths.

In the meanwhile, I can confirm the deeplasmid image still works on CPU. I tested it on MacBook with Intel chip. It is half slower than the GPU image, but isn't a game-stopper if you have a few hundred contigs.

In case you have Ubuntu with Ryzen processors you can try the deeplasmid-cpu-ubuntu image, I provide on dockerhub, where I downgraded Keras since it has some incompatibilities with Ryzen processors (also mentioned in the README.md file). It gives the same results as the deeplasmid and deeplasmid-gpu images. The command I used on the testing file that is under the master branch:

testing/649989979$ sudo /usr/bin/docker run -it -v pwd/649989979.fna:/srv/jgi-ml/classifier/dl/in.fasta -v pwd/649989979.fna.OUT:/srv/jgi-ml/classifier/dl/outdir billandreo/deeplasmid-cpu-ubuntu feature_DL_plasmid_predict.sh in.fasta outdir

If you prefer I could also run your analysis locally and send you back the results, if you are willing to email me your contigs.

Thanks, Bill

On Sun, Jul 3, 2022 at 12:22 AM Mike Yu @.***> wrote:

Hi Bill,

Thanks for the quick response, and thanks for the explanation about Prodigal, etc!

This is the command I ran on my new deeplasmid image (which I called 'deeplasmid-gpu-build')

docker run -it --gpus all -v /mnt/data/pWCP_predictions/649989979.fna:/srv/jgi-ml/classifier/dl/in.fasta -v /mnt/data/pWCP_predictions/deeplasmid_predictions:/srv/jgi-ml/classifier/dl/outdir deeplasmid-gpu-build feature_DL_plasmid_predict.sh in.fasta outdir

Note that I substituted the parameters $(ls /dev/nvidia | xargs -I{} echo '--device={}') $(ls /usr/lib/-linux-gnu/{libcuda,libnvidia}* for --gpus all. I assumed --gpus all was sufficient because I could successfully run the command docker run --rm --gpus all nvidia/cuda nvidia-smi (see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html ).

But in case this assumption is wrong, I also tried to follow your parameters (with a little modification to the base paths). I ran the command below, but it also gave a similar error as when I instead used --gpus all.

docker run -it --rm --device=/dev/nvidia0 --device=/dev/nvidiactl --device=/dev/nvidia-modeset --device=/dev/nvidia-uvm --device=/dev/nvidia-uvm-tools -v /usr/lib/libcuda.so:/usr/lib/libcuda.so:ro -v /usr/lib/libcuda.so.1:/usr/lib/libcuda.so.1:ro -v /usr/lib/libcuda.so.450.80.02:/usr/lib/libcuda.so.450.80.02:ro -v /usr/lib/libnvidia-allocator.so:/usr/lib/libnvidia-allocator.so:ro -v /usr/lib/libnvidia-allocator.so.1:/usr/lib/libnvidia-allocator.so.1:ro -v /usr/lib/libnvidia-allocator.so.450.80.02:/usr/lib/libnvidia-allocator.so.450.80.02:ro -v /usr/lib/libnvidia-compiler.so.450.80.02:/usr/lib/libnvidia-compiler.so.450.80.02:ro -v /usr/lib/libnvidia-eglcore.so.450.80.02:/usr/lib/libnvidia-eglcore.so.450.80.02:ro -v /usr/lib/libnvidia-encode.so:/usr/lib/libnvidia-encode.so:ro -v /usr/lib/libnvidia-encode.so.1:/usr/lib/libnvidia-encode.so.1:ro -v /usr/lib/libnvidia-encode.so.450.80.02:/usr/lib/libnvidia-encode.so.450.80.02:ro -v /usr/lib/libnvidia-fbc.so:/usr/lib/libnvidia-fbc.so:ro -v /usr/lib/libnvidia-fbc.so.1:/usr/lib/libnvidia-fbc.so.1:ro -v /usr/lib/libnvidia-fbc.so.450.80.02:/usr/lib/libnvidia-fbc.so.450.80.02:ro -v /usr/lib/libnvidia-glcore.so.450.80.02:/usr/lib/libnvidia-glcore.so.450.80.02:ro -v /usr/lib/libnvidia-glsi.so.450.80.02:/usr/lib/libnvidia-glsi.so.450.80.02:ro -v /usr/lib/libnvidia-glvkspirv.so.450.80.02:/usr/lib/libnvidia-glvkspirv.so.450.80.02:ro -v /usr/lib/libnvidia-ifr.so:/usr/lib/libnvidia-ifr.so:ro -v /usr/lib/libnvidia-ifr.so.1:/usr/lib/libnvidia-ifr.so.1:ro -v /usr/lib/libnvidia-ifr.so.450.80.02:/usr/lib/libnvidia-ifr.so.450.80.02:ro -v /usr/lib/libnvidia-ml.so:/usr/lib/libnvidia-ml.so:ro -v /usr/lib/libnvidia-ml.so.1:/usr/lib/libnvidia-ml.so.1:ro -v /usr/lib/libnvidia-ml.so.450.80.02:/usr/lib/libnvidia-ml.so.450.80.02:ro -v /usr/lib/libnvidia-opencl.so.1:/usr/lib/libnvidia-opencl.so.1:ro -v /usr/lib/libnvidia-opencl.so.450.80.02:/usr/lib/libnvidia-opencl.so.450.80.02:ro -v /usr/lib/libnvidia-opticalflow.so:/usr/lib/libnvidia-opticalflow.so:ro -v /usr/lib/libnvidia-opticalflow.so.1:/usr/lib/libnvidia-opticalflow.so.1:ro -v /usr/lib/libnvidia-opticalflow.so.450.80.02:/usr/lib/libnvidia-opticalflow.so.450.80.02:ro -v /usr/lib/libnvidia-ptxjitcompiler.so:/usr/lib/libnvidia-ptxjitcompiler.so:ro -v /usr/lib/libnvidia-ptxjitcompiler.so.1:/usr/lib/libnvidia-ptxjitcompiler.so.1:ro -v /usr/lib/libnvidia-ptxjitcompiler.so.450.80.02:/usr/lib/libnvidia-ptxjitcompiler.so.450.80.02:ro -v /usr/lib/libnvidia-tls.so.450.80.02:/usr/lib/libnvidia-tls.so.450.80.02:ro -v /mnt/data/pWCP_predictions/649989979.fna:/srv/jgi-ml/classifier/dl/in.fasta -v /mnt/data/pWCP_predictions/deeplasmid_predictions:/srv/jgi-ml/classifier/dl/outdir deeplasmid-gpu-ubuntu-build feature_DL_plasmid_predict.sh in.fasta outdir

Best, Mike

— Reply to this email directly, view it on GitHub https://github.com/wandreopoulos/deeplasmid/issues/3#issuecomment-1173028053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANGW5LEO23G6NU536EZ7PLVSE5TLANCNFSM52OPVJYA . You are receiving this because you commented.Message ID: @.***>

-- Thanks, Bill


William B. Andreopoulos, Ph.D. Joint Genome Institute LBNL

wandreopoulos commented 1 year ago

Hello Michael @michaelkyu :

I just released a new Docker image for GPUs, that is built on tensorflow/tensorflow:latest-gpu. I left the CNTK stuff behind and moved to TensorFlow.

As described in the README, the usage is much simpler. For example: docker pull billandreo/deeplasmid.tf.gpu2 sudo /usr/bin/docker run -it -v /path/to/fasta:/srv/jgi-ml/classifier/dl/in.fasta -v /path/to/OUT/dir:/srv/jgi-ml/classifier/dl/outdir billandreo/deeplasmid.tf.gpu2 deeplasmid.sh in.fasta outdir

For rebuilding I provided a new Dockerfile.GPU2. Note it is still necessary to have the prodigal, bbtools/sketch and the .h5 and pfam files under the same dir (they can be downloaded from the nersc portal mentioned): sudo docker build -t billandreo/deeplasmid.tf.gpu2 -f Dockerfile.GPU2 .

The main bottleneck is the feature extraction, done before deep learning, which doesn't run on GPUs. I will have a couple of MS students soon who will hopefully improve the feature extraction by parallelizing it further (moving away from the py multiprocessing package), but no guarantees when it will be finished. This is the main reason why I've been running deeplasmid on microbial isolates, mostly, as for huge metagenomic datasets it still needs some work.

Thanks, Bill