pronobis / libspn-keras

Library for learning and inference with Sum-product Networks utilizing TensorFlow 2.x and Keras
Other
47 stars 9 forks source link

Demo Porgram Issue #18

Closed JiayuGaoMax closed 4 years ago

JiayuGaoMax commented 4 years ago

I got some Issues with just running the Test/Demo program.

Image Classification: A Deep Generalized Convolutional Sum-Product Network (DGC-SPN) with libspn-keras in ColabOpen In Colab

and

Image Completion: A Deep Generalized Convolutional Sum-Product Network (DGC-SPN) with libspn-keras in Colab.Open In Colab

However, the third program runs fine. Randomly structured SPNs for image classification I got an error message such as I am currently use Windows 10, Nvidia GPU, TensorFlow 2.3. I tried a lower level of Python and TF, don't work.

Epoch 1/10 Traceback (most recent call last): File "Demo.py", line 136, in sum_product_network.fit(train_data, epochs=10) File "C:\Users\o\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\keras\engine\training.py", line 108, in _method_wrapper return method(self, *args, kwargs) File "C:\Users\ao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\keras\engine\training.py", line 1098, in fit tmp_logs = train_function(iterator) File "C:\Users\Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\def_function.py", line 780, in call result = self._call(*args, *kwds) File "C:\Users\ Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\def_function.py", line 840, in _call return self._stateless_fn(args, kwds) File "C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\function.py", line 2829, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call return self._call_flat( File "C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat return self._build_call_outputs(self._inference_function.call( File "C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\function.py", line 545, in call outputs = execute.execute( File "C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Current libxsmm and customized CPU implementations do not yet support dilation rates larger than 1. [[node gradient_tape/sequential_sum_product_network/conv2d_product_6/OneHotConv/Conv2DBackpropInput (defined at C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\libspn_keras\models\sequential_spn.py:174) ]] [Op:__inference_train_function_3335]

Errors may have originated from an input operation. Input Source operations connected to node gradient_tape/sequential_sum_product_network/conv2d_product_6/OneHotConv/Conv2DBackpropInput: sequential_sum_product_network/conv2d_product_6/OneHotConv/Conv2D/ReadVariableOp (defined at C:\Users\Jiayu Gao\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\libspn_keras\layers\conv2d_product.py:140)

JiayuGaoMax commented 4 years ago

The error message is exactly the same for both Demos

Image Classification: A Deep Generalized Convolutional Sum-Product Network (DGC-SPN) with libspn-keras in ColabOpen In Colab

and

Image Completion: A Deep Generalized Convolutional Sum-Product Network (DGC-SPN) with libspn-keras in Colab.Open In Colab

jostosh commented 4 years ago

The TensorFlow backend does not support dilations > 1 for CPU.

It seems that your GPU is not actually being used: Current libxsmm and customized CPU implementations do not yet support dilation rates larger than 1..

When running the Colab notebook with a GPU or when running it locally with GPU enabled, the settings used in those example notebooks should no longer be a problem.

If your GPU is detected correctly by the TensorFlow backend you should see something like:

Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14968 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)

If you want to use these architectures with a CPU, I will have to implement 'workarounds' that would allow usage through CPU. Let me know if this is the case. Also, let me know if you need more help setting up GPU support for TensorFlow.

JiayuGaoMax commented 4 years ago

Funny thing is my card is detected, it missing Cuda library. I tried installed the Cuda, but it still doesn't work I found that my card was not in the Cuda supported list which 1660 Super. It's one of the newer cards.

https://developer.nvidia.com/cuda-gpus#compute

The only option I have is probably to change to CPU or buy a new card =.=...

I very much appreciate your help though!

2020-10-05 18:18:08.886398: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2020-10-05 18:18:08.891335: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2020-10-05 18:18:13.940900: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll 2020-10-05 18:18:14.003757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:08:00.0 name: GeForce GTX 1660 SUPER computeCapability: 7.5 coreClock: 1.83GHz coreCount: 22 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s 2020-10-05 18:18:14.009949: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2020-10-05 18:18:14.014535: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found 2020-10-05 18:18:14.019944: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found 2020-10-05 18:18:14.025296: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found 2020-10-05 18:18:14.030216: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found 2020-10-05 18:18:14.035028: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found 2020-10-05 18:18:14.039697: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found 2020-10-05 18:18:14.044395: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices...

JiayuGaoMax commented 4 years ago

Cuda compatible list

GPU Compute Capability
GeForce RTX 3090 8.6
GeForce RTX 3080 8.6
GeForce RTX 3070 8.6
NVIDIA TITAN RTX 7.5
Geforce RTX 2080 Ti 7.5
Geforce RTX 2080 7.5
Geforce RTX 2070 7.5
Geforce RTX 2060 7.5
NVIDIA TITAN V 7.0
NVIDIA TITAN Xp 6.1
NVIDIA TITAN X 6.1
GeForce GTX 1080 Ti 6.1
GeForce GTX 1080 6.1
GeForce GTX 1070 6.1
GeForce GTX 1060 6.1
GeForce GTX 1050 6.1
GeForce GTX TITAN X 5.2
GeForce GTX TITAN Z 3.5
GeForce GTX TITAN Black 3.5
GeForce GTX TITAN 3.5
GeForce GTX 980 Ti 5.2
GeForce GTX 980 5.2
GeForce GTX 970 5.2
GeForce GTX 960 5.2
GeForce GTX 950 5.2
GeForce GTX 780 Ti 3.5
GeForce GTX 780 3.5
GeForce GTX 770 3.0
GeForce GTX 760 3.0
GeForce GTX 750 Ti 5.0
GeForce GTX 750 5.0
GeForce GTX 690 3.0
GeForce GTX 680 3.0
GeForce GTX 670 3.0
GeForce GTX 660 Ti 3.0
GeForce GTX 660 3.0
GeForce GTX 650 Ti BOOST 3.0
GeForce GTX 650 Ti 3.0
GeForce GTX 650 3.0
GeForce GTX 560 Ti 2.1
GeForce GTX 550 Ti 2.1
GeForce GTX 460 2.1
GeForce GTS 450 2.1
GeForce GTS 450* 2.1
GeForce GTX 590 2.0
GeForce GTX 580 2.0
GeForce GTX 570 2.0
GeForce GTX 480 2.0
GeForce GTX 470 2.0
GeForce GTX 465 2.0
GeForce GT 740 3.0
GeForce GT 730 3.5
GeForce GT 730 DDR3,128bit 2.1
GeForce GT 720 3.5
GeForce GT 705* 3.5
GeForce GT 640 (GDDR5) 3.5
GeForce GT 640 (GDDR3) 2.1
GeForce GT 630 2.1
GeForce GT 620 2.1
GeForce GT 610 2.1
GeForce GT 520 2.1
GeForce GT 440 2.1
GeForce GT 440* 2.1
GeForce GT 430 2.1
GeForce GT 430* 2.1
jostosh commented 4 years ago

Seems more people have had similar problems maybe you can check it there.

They seem to suggest to explicitly allow for mem growth on GPU. You can try adding this snippet to the beginning of your script

    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
        try:
            # Currently, memory growth needs to be the same across GPUs
            for gpu in gpus:
                tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.experimental.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
        except RuntimeError as e:
            # Memory growth must be set before GPUs have been initialized
            print(e)
jostosh commented 4 years ago

Of course, you do need to also set up CuDNN correctly before that snippet is of any help, the link mentions a setup of CUDA/CuDNN that has worked for some people.

jostosh commented 4 years ago

Will close this issue for now, but feel free to re-open if you want help

JiayuGaoMax commented 4 years ago

Appreciate it, Sir.