Error: could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR in colab file

YukiSakuma commented 4 years ago

Basically I am already at the part of this line

# Make sure videos are in the videos folder inside hent-AI
!python samples/hentai/hentai.py inference --weights=weights.h5 --sources=/content/drive/My\ Drive/hent-AI/videos/ --dtype=esrgan

but when I run it I am getting a TF error, my video is only 7 seconds long

/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.
Weights:  weights.h5
Dataset:  None
Logs:  /logs
Starting inference
2020-04-26 18:52:04.701074: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-04-26 18:52:04.858325: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-26 18:52:04.858892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
totalMemory: 14.73GiB freeMemory: 14.62GiB
2020-04-26 18:52:04.858943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2020-04-26 18:52:05.266501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-26 18:52:05.266561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2020-04-26 18:52:05.266574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2020-04-26 18:52:05.266685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/device:GPU:0 with 14147 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
CUDA-compatible GPU located!
Model warmup complete
2020-04-26 18:52:10.653999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2020-04-26 18:52:10.654084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-26 18:52:10.654101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2020-04-26 18:52:10.654120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2020-04-26 18:52:10.654225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14147 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
Loading weights...  Weights loaded
Detected fps: 23.976023976023978
Video read complete, starting video detection. NOTE: frame 0 may take up to 1 minute
frame:  0
2020-04-26 18:52:16.429973: E tensorflow/stream_executor/cuda/cuda_dnn.cc:332] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

natethegreate commented 4 years ago

Basically there is some issue with some of the GPUs that Colab uses, like the Nvidia Tesla T4. The best solution would be to terminate your session using Runtime->Manage sessions-> TERMINATE button, or just close the tab. Then, start a new session. The first command should tell you the name of the card in the seconds row, first column. More commonly you will get a K80 or P100, which will work. Unfortunately, after restarting the session, you will need to run the steps again.

YukiSakuma commented 4 years ago

Thank you that worked it took me some multiple tries to get the right aforementioned gpu though, by the way I see that the output has no audio can I suggest to add an option where the output has audio.

I could also manually add the audio on my own (after decensor) but it doesn't hurt to automatically do it...

natethegreate commented 4 years ago

Audio is not yet supported, and cv2 does not support video and audio simultaneously. It will likely require ffmpeg or some additional library for the future.

natethegreate / hent-AI

Error: could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR in colab file #7