Open Hoda1394 opened 2 years ago
@Hoda1394 - what command are you using to run the container? i don't have any experience running a docker image with a gpu; i've only used apptainer/singularity with gpu.
a few potential problems come to mind (not saying that any of these are present here):
docker run
command is not correct. i'm assuming there are some extra flags that need to be added to use gpu.another point -- you can test whether a gpu is available with tf.test.is_gpu_available()
. tensorflow 2.x also has tf.config.list_physical_devices("GPU")
but not sure if 1.x has it.
another thought -- try validating that the official tensorflow image can use the gpu. so run the tensorflow/tensorflow:1.12.3-gpu-py3
image in a way that should use the gpu and test that it actually sees the gpu. if the container sees the gpu, the problem is somewhere in the dockerfile.
Actually, I was running the singularity conversion of this image with gpu.
the gpu is visible inside the container but TensorFlow can't see it. I tested with the official image and tf.test.is_gpu_available()
returns False
. So, it seems that the issue is related to the base image!
As additional info when I run pip list |grep tensorflow
inside the container, I get
tensorflow 1.12.3
tensorflow-gpu 1.12.0
there are two versions of TensorFlow installed. not sure if this can cause this issue...
As additional info when I run
pip list |grep tensorflow
inside the container, I gettensorflow 1.12.3 tensorflow-gpu 1.12.0
there are two versions of TensorFlow installed. not sure if this can cause this issue...
this is probably the problem (or one of them!). can you try pip list
with the base image? see which one is present. and see if the base image can see the gpu.
I tried this with the base image and saw both. when running python and import tf, the tf.__version__
returns 1.12.3
. So, it seems that tensorflow is getting imported rather than tensorflow-gpu
I tried to uninstall it inside the container but I was not successful.
i can reproduce this... it could be a problem with the 1.12.3-gpu-py3 docker image. why are we using such an old image anyway?
docker run --rm tensorflow/tensorflow:1.12.3-gpu-py3 python -c 'import tensorflow as tf; print(tf.test.is_built_with_cuda())'
False
the 1.14.0-gpu-py3 image works.
docker run --rm tensorflow/tensorflow:1.14.0-gpu-py3 python -c 'import tensorflow as tf; print(tf.test.is_built_with_cuda())'
True
we should probably use a newer image. i realize we used 1.12 in the project, but we can test if everything works correctly with 1.15 (the last release of the 1.x series).
I tried removing tensorflow during the build and tensorflow-gpu doesn't work properly without it. I already tested the tensorflow1.15 and I got some other errors due to the version mismatch so if we want to use tensorflow1.15 we may need to update the code.
I will try version 1.14.0-gpu-py3
also.
feel free to post any errors you get when trying newer versions. paste the entire traceback and i can take a look
I have tried so many different things to address this issue #33 and among them, this Dockerfile can be built without error but when I run the container, TensorFlow does not see the gpu and runs on cpu! This container image is available in docker hub as hodadock/kwyk:gpu_test
@satra, @kaczmarj -Any idea how we can fix it?