singularityhub / shpc-registry

A remote registry for Singularity Registry HPC 🖊️
https://singularityhub.github.io/shpc-registry/
Mozilla Public License 2.0
13 stars 18 forks source link

The latest version of TensorFlow did not work @ tensorflow-notebook #200

Open y-vectorfield opened 7 months ago

y-vectorfield commented 7 months ago

I implemented the latest version of TensorFlow @ tensorflow-notebook:latest. Of course, my computer was equipped with an NVIDIA GPU(Geforce RTX 2080SP).

import tensorflow as tf

print(tf.config.list_physical_devices('GPU'))

The following error occurred.

2024-02-19 01:52:01.144475: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-19 01:52:01.166977: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-19 01:52:01.167001: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-19 01:52:01.167019: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-19 01:52:01.171088: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-19 01:52:01.171360: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-19 01:52:01.778157: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[]
2024-02-19 01:52:02.377608: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-02-19 01:52:02.378022: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2211] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
vsoch commented 7 months ago

I'm afraid I can't help debug individual containers for specific environments, but I'd suggest you talk to your research computing staff to ask for some help!

vsoch commented 7 months ago

I'd also suggest double checking the singularity flags required to enable seeing the devices.

y-vectorfield commented 7 months ago

I found the reason for this issue. I read a base image of this container image. Some libraries and settings for GPU were not enabled.