CUDA support doesnot work

icookycom commented 1 year ago

Hi i have noticied that CUDA is not working? Does DCT-Net suports CUDA calculations?

I had to install first conda install cudatoolkit=10.1 conda install cudnn

It uses CUDA but with error 2022-12-04 09:31:23.527110: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA RTX A4500, Compute Capability 8.6 2022-12-04 09:31:23.527239: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-12-04 09:31:23.527364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:0a:00.0 name: NVIDIA RTX A4500 computeCapability: 8.6 coreClock: 1.65GHz coreCount: 56 deviceMemorySize: 19.70GiB deviceMemoryBandwidth: 596.12GiB/s 2022-12-04 09:31:23.527408: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2022-12-04 09:31:23.528422: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2022-12-04 09:31:23.529403: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2022-12-04 09:31:23.529550: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2022-12-04 09:31:23.530466: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2022-12-04 09:31:23.530985: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2022-12-04 09:31:23.532948: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2022-12-04 09:31:23.533015: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-12-04 09:31:23.533156: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-12-04 09:31:23.533241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2022-12-04 09:31:23.533267: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2022-12-04 09:31:23.552689: E tensorflow/core/common_runtime/session.cc:91] Failed to create session: Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid 2022-12-04 09:31:23.552708: E tensorflow/c/c_api.cc:2184] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid Traceback (most recent call last): File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/utils/registry.py", line 211, in build_from_cfg return obj_cls(**args) File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/pipelines/cv/image_cartoon_pipeline.py", line 42, in init self.facer = FaceAna(self.model) File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/facer.py", line 20, in init self.face_detector = FaceDetector(model_dir) File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/face_detector.py", line 26, in init self._graph, self._sess = self.init_model(self.model_path) File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/face_detector.py", line 113, in init_model model = init_pb(pb_path) File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/face_detector.py", line 105, in init_pb sess = tf.Session(config=config) File "/home/alexandr/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1586, in init super(Session, self).init(target, graph, config=config) File "/home/alexandr/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 701, in init self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts) tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid

During handling of the above exception, another exception occurred:

menyifang commented 1 year ago

it supports both GPU and CPU. Please ensure tensorflow-gpu compatible with cuda version.

onefish51 commented 1 year ago

there is something wrong ! you said :

pip install --upgrade tensorflow-gpu==1.15

and the tensorflow official documentation shown : so cuda is 10.0, and cudnn is 7.4

then

pip install "modelscope[cv]==1.3.2" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

install log shown

CuDNN8.5.0 is required install

and then

Loaded runtime CuDNN library: 8.5.0 but source was compiled with: 7.6.4.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

so I failed !

I tested tensorflow-gpu

python 
Python 3.7.16 (default, Jan 17 2023, 22:20:44) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
...
True

h3clikejava commented 9 months ago

I tested on: ubuntu 18.04 python 3.7 CUDA 10/10.1/11.2 cudnn 7.4/7.6.0/7.6.1 tensorflow-gup 1.14/1.15 torch 1.7.1+cu101 numpy1.18.5

I always get black result like this: I have been trying to set up this training environment for three days, but ultimately failed. The documentation for this project is really terrible, and I don't know if it's because of changes in the company's business, but the development members have stopped maintaining it. The documentation has inconsistencies in various places regarding the runtime environment. If there is anyone kind enough who has been able to train successfully, please provide your environment and training scripts. Thank you.

menyifang / DCT-Net

CUDA support doesnot work #21