Closed JiayuGaoMax closed 4 years ago
The error message is exactly the same for both Demos
Image Classification: A Deep Generalized Convolutional Sum-Product Network (DGC-SPN) with libspn-keras in ColabOpen In Colab
and
Image Completion: A Deep Generalized Convolutional Sum-Product Network (DGC-SPN) with libspn-keras in Colab.Open In Colab
The TensorFlow backend does not support dilations > 1
for CPU.
It seems that your GPU is not actually being used: Current libxsmm and customized CPU implementations do not yet support dilation rates larger than 1.
.
When running the Colab notebook with a GPU or when running it locally with GPU enabled, the settings used in those example notebooks should no longer be a problem.
If your GPU is detected correctly by the TensorFlow backend you should see something like:
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14968 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
If you want to use these architectures with a CPU, I will have to implement 'workarounds' that would allow usage through CPU. Let me know if this is the case. Also, let me know if you need more help setting up GPU support for TensorFlow.
Funny thing is my card is detected, it missing Cuda library. I tried installed the Cuda, but it still doesn't work I found that my card was not in the Cuda supported list which 1660 Super. It's one of the newer cards.
https://developer.nvidia.com/cuda-gpus#compute
The only option I have is probably to change to CPU or buy a new card =.=...
I very much appreciate your help though!
2020-10-05 18:18:08.886398: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2020-10-05 18:18:08.891335: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2020-10-05 18:18:13.940900: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll 2020-10-05 18:18:14.003757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:08:00.0 name: GeForce GTX 1660 SUPER computeCapability: 7.5 coreClock: 1.83GHz coreCount: 22 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s 2020-10-05 18:18:14.009949: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2020-10-05 18:18:14.014535: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found 2020-10-05 18:18:14.019944: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found 2020-10-05 18:18:14.025296: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found 2020-10-05 18:18:14.030216: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found 2020-10-05 18:18:14.035028: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found 2020-10-05 18:18:14.039697: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found 2020-10-05 18:18:14.044395: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices...
Cuda compatible list
GPU | Compute Capability |
---|---|
GeForce RTX 3090 | 8.6 |
GeForce RTX 3080 | 8.6 |
GeForce RTX 3070 | 8.6 |
NVIDIA TITAN RTX | 7.5 |
Geforce RTX 2080 Ti | 7.5 |
Geforce RTX 2080 | 7.5 |
Geforce RTX 2070 | 7.5 |
Geforce RTX 2060 | 7.5 |
NVIDIA TITAN V | 7.0 |
NVIDIA TITAN Xp | 6.1 |
NVIDIA TITAN X | 6.1 |
GeForce GTX 1080 Ti | 6.1 |
GeForce GTX 1080 | 6.1 |
GeForce GTX 1070 | 6.1 |
GeForce GTX 1060 | 6.1 |
GeForce GTX 1050 | 6.1 |
GeForce GTX TITAN X | 5.2 |
GeForce GTX TITAN Z | 3.5 |
GeForce GTX TITAN Black | 3.5 |
GeForce GTX TITAN | 3.5 |
GeForce GTX 980 Ti | 5.2 |
GeForce GTX 980 | 5.2 |
GeForce GTX 970 | 5.2 |
GeForce GTX 960 | 5.2 |
GeForce GTX 950 | 5.2 |
GeForce GTX 780 Ti | 3.5 |
GeForce GTX 780 | 3.5 |
GeForce GTX 770 | 3.0 |
GeForce GTX 760 | 3.0 |
GeForce GTX 750 Ti | 5.0 |
GeForce GTX 750 | 5.0 |
GeForce GTX 690 | 3.0 |
GeForce GTX 680 | 3.0 |
GeForce GTX 670 | 3.0 |
GeForce GTX 660 Ti | 3.0 |
GeForce GTX 660 | 3.0 |
GeForce GTX 650 Ti BOOST | 3.0 |
GeForce GTX 650 Ti | 3.0 |
GeForce GTX 650 | 3.0 |
GeForce GTX 560 Ti | 2.1 |
GeForce GTX 550 Ti | 2.1 |
GeForce GTX 460 | 2.1 |
GeForce GTS 450 | 2.1 |
GeForce GTS 450* | 2.1 |
GeForce GTX 590 | 2.0 |
GeForce GTX 580 | 2.0 |
GeForce GTX 570 | 2.0 |
GeForce GTX 480 | 2.0 |
GeForce GTX 470 | 2.0 |
GeForce GTX 465 | 2.0 |
GeForce GT 740 | 3.0 |
GeForce GT 730 | 3.5 |
GeForce GT 730 DDR3,128bit | 2.1 |
GeForce GT 720 | 3.5 |
GeForce GT 705* | 3.5 |
GeForce GT 640 (GDDR5) | 3.5 |
GeForce GT 640 (GDDR3) | 2.1 |
GeForce GT 630 | 2.1 |
GeForce GT 620 | 2.1 |
GeForce GT 610 | 2.1 |
GeForce GT 520 | 2.1 |
GeForce GT 440 | 2.1 |
GeForce GT 440* | 2.1 |
GeForce GT 430 | 2.1 |
GeForce GT 430* | 2.1 |
Seems more people have had similar problems maybe you can check it there.
They seem to suggest to explicitly allow for mem growth on GPU. You can try adding this snippet to the beginning of your script
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
Of course, you do need to also set up CuDNN correctly before that snippet is of any help, the link mentions a setup of CUDA/CuDNN that has worked for some people.
Will close this issue for now, but feel free to re-open if you want help
Appreciate it, Sir.
I got some Issues with just running the Test/Demo program.
Image Classification: A Deep Generalized Convolutional Sum-Product Network (DGC-SPN) with libspn-keras in ColabOpen In Colab
and
Image Completion: A Deep Generalized Convolutional Sum-Product Network (DGC-SPN) with libspn-keras in Colab.Open In Colab
However, the third program runs fine. Randomly structured SPNs for image classification I got an error message such as I am currently use Windows 10, Nvidia GPU, TensorFlow 2.3. I tried a lower level of Python and TF, don't work.
Epoch 1/10 Traceback (most recent call last): File "Demo.py", line 136, in
sum_product_network.fit(train_data, epochs=10)
File "C:\Users\o\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\keras\engine\training.py", line 108, in _method_wrapper
return method(self, *args, kwargs)
File "C:\Users\ao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\keras\engine\training.py", line 1098, in fit
tmp_logs = train_function(iterator)
File "C:\Users\Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\def_function.py", line 780, in call
result = self._call(*args, *kwds)
File "C:\Users\ Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\def_function.py", line 840, in _call
return self._stateless_fn(args, kwds)
File "C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\function.py", line 2829, in call
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call
return self._call_flat(
File "C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\function.py", line 545, in call
outputs = execute.execute(
File "C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Current libxsmm and customized CPU implementations do not yet support dilation rates larger than 1.
[[node gradient_tape/sequential_sum_product_network/conv2d_product_6/OneHotConv/Conv2DBackpropInput (defined at C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\libspn_keras\models\sequential_spn.py:174) ]] [Op:__inference_train_function_3335]
Errors may have originated from an input operation. Input Source operations connected to node gradient_tape/sequential_sum_product_network/conv2d_product_6/OneHotConv/Conv2DBackpropInput: sequential_sum_product_network/conv2d_product_6/OneHotConv/Conv2D/ReadVariableOp (defined at C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\libspn_keras\layers\conv2d_product.py:140)