Docker gpu mode not working

Kamranaway commented 3 weeks ago

I tested on a server with an A30 GPU and a laptop with an RTX 3060. I believe I followed all steps in the setup guide.

docker run -t --rm --gpus all -v  /home/koris/BERTax/in:/in/ fkre/bertax:latest /in/fungi1000.fa
2024-10-28 02:51:58.592252: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2024-10-28 02:52:01.872815: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2024-10-28 02:52:01.874382: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2024-10-28 02:52:02.129207: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-10-28 02:52:02.129421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.425GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2024-10-28 02:52:02.129502: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2024-10-28 02:52:02.198718: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2024-10-28 02:52:02.198855: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2024-10-28 02:52:02.233217: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2024-10-28 02:52:02.240395: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2024-10-28 02:52:02.291158: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2024-10-28 02:52:02.307438: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2024-10-28 02:52:02.400190: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2024-10-28 02:52:02.400936: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-10-28 02:52:02.401359: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-10-28 02:52:02.401504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2024-10-28 02:52:02.402187: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-28 02:52:02.410838: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-10-28 02:52:02.411007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.425GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2024-10-28 02:52:02.411152: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2024-10-28 02:52:02.411276: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2024-10-28 02:52:02.411343: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2024-10-28 02:52:02.411423: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2024-10-28 02:52:02.411487: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2024-10-28 02:52:02.411523: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2024-10-28 02:52:02.411581: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2024-10-28 02:52:02.411616: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2024-10-28 02:52:02.412251: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-10-28 02:52:02.412716: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-10-28 02:52:02.412760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2024-10-28 02:52:02.413368: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2024-10-28 03:00:04.387965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-10-28 03:00:04.388080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2024-10-28 03:00:04.388143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2024-10-28 03:00:04.389542: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-10-28 03:00:04.389596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1489] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2024-10-28 03:00:04.390013: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-10-28 03:00:04.390372: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-10-28 03:00:04.390625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4678 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2024-10-28 03:00:04.392831: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
WARNING:tensorflow:AutoGraph could not transform <bound method PositionEmbedding.call of <keras_pos_embd.pos_embd.PositionEmbedding object at 0x7f4f0816b5b0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <bound method MultiHeadAttention.call of <keras_multi_head.multi_head_attention.MultiHeadAttention object at 0x7f4f0816b760>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <bound method ScaledDotProductAttention.call of <keras_self_attention.scaled_dot_attention.ScaledDotProductAttention object at 0x7f4ea8136f10>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <bound method Extract.call of <keras_bert.layers.extract.Extract object at 0x7f4f08093820>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
2024-10-28 03:00:09.215262: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2024-10-28 03:00:09.219918: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3193910000 Hz
2024-10-28 03:00:12.537181: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2024-10-28 03:02:12.883371: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
Traceback (most recent call last):
  File "/opt/conda/bin/bertax", line 10, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.9/site-packages/bertax/bertax.py", line 112, in main
    preds = model.predict(x, verbose=int(args.verbose), batch_size=args.batch_size)
  File "/opt/conda/lib/python3.9/site-packages/tensorflow/python/keras/engine/training.py", line 1629, in predict
    tmp_batch_outputs = self.predict_function(iterator)
  File "/opt/conda/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/opt/conda/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 894, in _call
    return self._concrete_stateful_fn._call_flat(
  File "/opt/conda/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 1918, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/opt/conda/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 555, in call
    outputs = execute.execute(
  File "/opt/conda/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError:  Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
         [[node model/Encoder-1-FeedForward/Tanh (defined at /lib/python3.9/site-packages/keras_transformer/gelu.py:11) ]] [Op:__inference_predict_function_12441]

Function call stack:
predict_function

Kamranaway commented 3 weeks ago

Of note, the server doesn't have the NUMA notes, but the output was identical otherwise (this log is from a WSL instance).

flomock commented 3 weeks ago

Hello, sorry for the inconvenience. I guess you have an incompatible cDNN and/or Cuda version with the tensorflow_gpu version installed in the container. See the list in the following stackoverflow page for more details https://stackoverflow.com/questions/75789104/cubin-cuda-error-no-binary-for-gpu-error-while-running-attention-layer-with-bid Please try to identify the tensorflow_gpu in the container, find and install the compatible version and please let us know if this fixed the issue. :)

PS: As far as I remember (at least while training) we needed a GPU with at least 11GB VRAM to run bertax. So I would try the changes discussed above on the A30 first if possible. :)

Kamranaway commented 2 weeks ago

Hi! Thank you for the response.

First, I am spinning up the docker container and setting the entrypoint to bash docker run --gpus all -it --rm --name bertyfix --entrypoint bash fkre/bertax:latest

From the container I confirmed I was on debian 11 x86_64.

Then I check the tensorflow version

(base) root@66819c1d89d9:/# python3 -c "import tensorflow as tf; print(tf.__version__)"
2024-11-05 04:08:13.850100: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2.4.1

This confirms that I should need CUDnn 8.0 and CUDA 11.0

I have tried to install both by manual means and using Conda, however I keep getting similar output as the above logs (with some variation depending on the version of CUDA, [I tested up to 11.3]. I could not successfully install CUDA by manual means.

Here are things I've tried:

Conda:

conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1

conda install https://anaconda.org/nvidia/cudatoolkit/11.0.221/download/linux-64/cudatoolkit-11.0.221-h6bb024c_0.tar.bz2
conda install https://anaconda.org/conda-forge/cudnn/8.0.5.39/download/linux-64/cudnn-8.0.5.39-hc0a50b0_1.tar.bz2

Manual:

The CUDA install page doesn't provide a setup for debian 11 until 11.5, so I was attempting to install CUDA 11.5

Before installing CUDA, I setup add-apt-repository

apt-get install software-properties-common
apt update

Then installing gnupg2 apt-get install gnupg2.

Next, I followed the network install instructions for my platform and architecture here. There was no pub key available, so I went into the sources list with apt edit-sources to manually set the nvidia url to trusted.

There's a snag at this point:

Errors were encountered while processing:
 /tmp/apt-dpkg-install-Mcd51K/076-nvidia-persistenced_560.35.03-1_amd64.deb
 /tmp/apt-dpkg-install-Mcd51K/211-nvidia-cuda-mps_560.35.03-1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

So I switched to the local runfile

wget https://developer.download.nvidia.com/compute/cuda/11.5.0/local_installers/cuda_11.5.0_495.29.05_linux.run
sh cuda_11.5.0_495.29.05_linux.run

Which also fails.

I have hit a wall and am unsure how to proceed. In the meantime I'll keep trying configurations.

Thank you and kind regards.

rnajena / bertax

Docker gpu mode not working #19

Conda:

Manual: