tensorflow / compression

Data compression in TensorFlow
Apache License 2.0
858 stars 249 forks source link

tfci.py recognizing, but not using GPUs #159

Closed Malocch1o closed 1 year ago

Malocch1o commented 2 years ago

Describe the bug I am attempting to utilize multiple Nvidia GPUs (Tesla M10s) when running compression/decompression with tfci.py. The program recognizes the GPUs are present, but still uses CPUs during runtime to compress the image.

"tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory" may be the issue here. I've included the full error log at the end of this post.

To Reproduce $ sudo time python3 tfci.py compress hific-lo kodim01.png

Expected behavior Should be using GPUs to compress & decompress images instead of CPU

System (please complete the following information):

Additional context As you can see in the error log, it detects 8 Tesla M10 GPUs, and loads cuDNN. At the bottom: "Start cannot spawn child process: No such file or directory" may be the issue here.

Here is the full error log:

$ sudo time python3 tfci.py compress hific-lo kodim01.png

2022-11-01 15:46:31.922013: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-11-01 15:46:36.237757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7471 MB memory: -> device: 0, name: Tesla M10, pci bus id: 0000:b1:00.0, compute capability: 5.0 2022-11-01 15:46:36.239223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 7471 MB memory: -> device: 1, name: Tesla M10, pci bus id: 0000:b2:00.0, compute capability: 5.0 2022-11-01 15:46:36.240502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 7471 MB memory: -> device: 2, name: Tesla M10, pci bus id: 0000:b3:00.0, compute capability: 5.0 2022-11-01 15:46:36.241818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 7471 MB memory: -> device: 3, name: Tesla M10, pci bus id: 0000:b4:00.0, compute capability: 5.0 2022-11-01 15:46:36.243048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:4 with 7471 MB memory: -> device: 4, name: Tesla M10, pci bus id: 0000:da:00.0, compute capability: 5.0 2022-11-01 15:46:36.244274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:5 with 7471 MB memory: -> device: 5, name: Tesla M10, pci bus id: 0000:db:00.0, compute capability: 5.0 2022-11-01 15:46:36.245463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:6 with 7471 MB memory: -> device: 6, name: Tesla M10, pci bus id: 0000:dc:00.0, compute capability: 5.0 2022-11-01 15:46:36.246612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:7 with 7471 MB memory: -> device: 7, name: Tesla M10, pci bus id: 0000:dd:00.0, compute capability: 5.0 INFO:tensorflow:Saver not created because there are no variables in the graph to restore I1101 15:46:42.459285 140275184080704 saver.py:1634] Saver not created because there are no variables in the graph to restore 2022-11-01 15:46:52.574958: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8600 2022-11-01 15:46:53.230356: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory 21.84user 17.95system 0:27.92elapsed 142%CPU (0avgtext+0avgdata 7646376maxresident)k 0inputs+48outputs (1major+4246028minor)pagefaults 0swaps

jonarchist commented 1 year ago

Hi, does TF see the GPU? What's the output of doing

import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
MahmoudAshraf97 commented 1 year ago

the mentioned warning has nothing to do with cuda, I get the same warning and it uses the gpu just fine this should be your case too because I see cuDNN is loaded which wouldn't have happened if the cpu was used exclusively