failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED

ouening commented 5 years ago

When I train voc data, the error happened. My GPU is RTX2080 8G * 2，tensorflow-gpu:1.12，keras2.2.4

Epoch 1/50 2019-01-28 00:16:00.441512: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "train.py", line 192, in <module> _main(annotation_path=anno) File "train.py", line 65, in _main callbacks=[logging, checkpoint]) File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 217, in fit_generator class_weight=class_weight) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__ return self._call(inputs) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__ run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=346112, n=32, k=64 [[{{node conv2d_3/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@batch_normalization_3/cond/FusedBatchNorm/Switch"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](leaky_re_lu_2/LeakyRelu, conv2d_3/kernel/read)]] [[{{node yolo_loss/while_1/LoopCond/_2963}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6607_yolo_loss/while_1/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_1/strided_slice_1/stack_2/_2805)]]

ShuteLee commented 5 years ago

Hey bro, have you figured it out? I met the same issue.

bingoxumo commented 5 years ago

i also met the same issue when i run the yolo-v3,did you solve this problem?

ouening commented 5 years ago

I met the same error! My GPU is RTX2080 8G * 2，tensorflow-gpu:1.12，keras2.2.4, Ubuntu18.04. Can somebody solve it?

tak-s commented 5 years ago

Try the following statement at the beginning of the code.

import keras.backend as K
cfg = K.tf.ConfigProto(gpu_options={'allow_growth': True})
K.set_session(K.tf.Session(config=cfg))

ouening commented 5 years ago

Try the following statement at the beginning of the code.
import keras.backend as K
cfg = K.tf.ConfigProto(gpu_options={'allow_growth': True})
K.set_session(K.tf.Session(config=cfg))

Hi, I still got some errors: Load weights model_data/yolo_weights.h5. Freeze the first 249 layers of total 252 layers. Train on 3439 samples, val on 382 samples, with batch size 32. Epoch 1/50 2019-03-24 10:53:58.419070: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "train.py", line 206, in <module> _main() File "train.py", line 81, in _main verbose=1) File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 217, in fit_generator class_weight=class_weight) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__ return self._call(inputs) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__ run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=1384448, n=32, k=64 [[{{node conv2d_3/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@batch_normalization_3/cond/FusedBatchNorm/Switch"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](leaky_re_lu_2/LeakyRelu, conv2d_3/kernel/read)]] [[{{node yolo_loss/while_1/LoopCond/_2963}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6607_yolo_loss/while_1/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_1/strided_slice_1/stack_2/_2805)]] any solution for it?

HanGaaaaa commented 5 years ago

hello i met the same error, my env is cuda9.0 cudnn 7.4 tensorflow-gpu1.12.0,my gpu is RTX 2080, this is my work computer, but my own computer has same env only gpu is 940 can run same project well,how can i do with this error,someone can help me?

ShuteLee commented 5 years ago

I think that it is a bug of RTX 2080 and I have not figured it out. If you get some progress about this issue, get in touch with me please. Thanks a lot

发自我的 iPhone

在 2019年3月29日，上午10:49，HanGaaaaa notifications@github.com 写道：

hello i met the same error, my env is cuda9.0 cudnn 7.4 tensorflow-gpu1.12.0,my gpu is RTX 2080, this is my work computer, but my own computer has same env only gpu is 940 can run same project well,how can i do with this error,someone can help me?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

S0soo commented 5 years ago

i also met same error, my gpu is RTX 2080ti, tensorflow-gpu 1.8.0, cuda 9.0, but in the GTX 1080ti, tensorflow-gpu 1.4.0, cuda 8.0, the program can run normally. Can someone give some advice? thanks

ouening commented 5 years ago

I have solved this problem: Install patchs for cuda9, there are 4 patchs that can be download from website:cuda9 patchs

guolihong commented 5 years ago

Hello! Did you solve it?How?

ShuteLee commented 5 years ago

Hello! Did you solve it?How?

I fixed this issue just by installing the CUDA Toolkit patch. https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exenetwork (choose your CUDA version)

zhixuanli commented 4 years ago

I have installed the CUDA Toolkit patch but still having this problem

yuanzhedong commented 4 years ago

I have same issue, same code running on K80 but not RTX2080

checko commented 4 years ago

same issue on my Titan RTX.

yuanzhedong commented 4 years ago

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

xiaohai-AI commented 4 years ago

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

After I made a change follow the above I still got the problem like the following:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

jinmingteo commented 4 years ago

@xiaohai-AI try this

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

kartikwar commented 4 years ago

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

I was also getting the same error in tensorflow -gpu 1.6.0 cuda 9.0. Upgrading to cuda 10.0 and tensorflow -gpu 1.14.0 . Solved the issue for me. Thanks @xiaohai-AI. Not sure why you are getting internal errot hough. Probably because you have two cuda versions or maybe because tensorflow is picking up wrong version of cudnn

mfshiu commented 3 years ago

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

But my tensorflow is 1.15, cuda is 10.0, gpu is RTX 3080, still have the same issue.

kartikwar commented 3 years ago

hey @mfshiu maybe you can try cuda 10.0 with tensorflow-gpu 1.14

allenyllee commented 3 years ago

hi @mfshiu, NVIDIA maintains its own version of tensorflow 1.15 here: https://github.com/NVIDIA/tensorflow#install , which support latest gpu card.

So, you need to remove official tensorflow which installed through pip or conda, and install nvidia's version, as its README.md says:

install the NVIDIA wheel index:

$ pip install --user nvidia-pyindex

install the current NVIDIA Tensorflow release:

$ pip install --user nvidia-tensorflow[horovod]

after installed, just use it as regular tensorflow:

import tensorflow as tf

drscotthawley commented 3 years ago

Hey @allenyllee I wonder if you might be able to clarify or help: When I follow those install instructions for the NVIDIA-tensorflow, I get a long error that tells me...to re-do what I just did?

$ pip install --user nvidia-pyindex
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting nvidia-pyindex
  Downloading nvidia-pyindex-1.0.6.tar.gz (6.7 kB)
Building wheels for collected packages: nvidia-pyindex
  Building wheel for nvidia-pyindex (setup.py) ... done
  Created wheel for nvidia-pyindex: filename=nvidia_pyindex-1.0.6-py3-none-any.whl size=4171 sha256=692df4078194418f4812516403399f2e96373ad780b93c98ce944b5f02efb35d
  Stored in directory: /tmp/pip-ephem-wheel-cache-kpx26e3z/wheels/52/31/c8/db9f8939a8bb1f3500ce81b630604cbfa6e31f82c8f1bd914d
Successfully built nvidia-pyindex
Installing collected packages: nvidia-pyindex
Successfully installed nvidia-pyindex-1.0.6

$ pip install --user nvidia-tensorflow[horovod]
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting nvidia-tensorflow[horovod]
  Downloading nvidia-tensorflow-0.0.1.dev4.tar.gz (3.8 kB)
    ERROR: Command errored out with exit status 1:
     command: /home/shawley/anaconda3/envs/spnet/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-yv_vnm57/nvidia-tensorflow/setup.py'"'"'; __file__='"'"'/tmp/pip-install-yv_vnm57/nvidia-tensorflow/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-1hvhhg4h
         cwd: /tmp/pip-install-yv_vnm57/nvidia-tensorflow/
    Complete output (17 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-yv_vnm57/nvidia-tensorflow/setup.py", line 150, in <module>
        raise RuntimeError(open("ERROR.txt", "r").read())
    RuntimeError:
    ###########################################################################################
    The package you are trying to install is only a placeholder project on PyPI.org repository.
    This package is hosted on NVIDIA Python Package Index.

    This package can be installed as:

$ pip install nvidia-pyindex
$ pip install nvidia-tensorflow
```

Please refer to NVIDIA instructions: https://github.com/NVIDIA/tensorflow#install.
###########################################################################################
----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.



Re-running those "This package can be installed as:" commands just results in the same error message again.

drscotthawley commented 3 years ago

Resolved this issue for myself: Be sure you're running Python 3.8 and Pip 20 or later.

GuillaumeMougeot commented 3 years ago

I had the same problem with an RTX 3090 + TF 1.15. I resolved my problem by using the official nvidia+tf1 ngc docker container, available here: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

seongyeop-jeong-poey commented 3 years ago

I had the same problem with an RTX 3090 + TF 1.15. I resolved my problem by using the official nvidia+tf1 ngc docker container, available here: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

It works very well to me, in my case with RTX 3090 +TF 1.15, nvidia+tf1 ngc docker container version '21.05-tf1-py3' works very well! Thanks alot.

bing-0906 commented 3 years ago

It works after I update the tensorflow version from 1.13.1 to 1.14. My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

But my tensorflow is 1.15, cuda is 10.0, gpu is RTX 3080, still have the same issue.

me too!!!!!!. have you solved this problem?

seongyeop-jeong-poey commented 3 years ago

It works after I update the tensorflow version from 1.13.1 to 1.14. My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

But my tensorflow is 1.15, cuda is 10.0, gpu is RTX 3080, still have the same issue.

me too!!!!!!. have you solved this problem?

please find a version that matches your GPU version in nvidia-docker hub

kwshh commented 2 years ago

i found the same question on a10 GPU, that 30-, a10, a100, etc. which compute capacity is more than 8.0 must use CUDA11.x, so you could't use tensorflow1.x which match CUDA10 or lower. some solution is that, use nvidia-tensorflow1.x and could use CUDA11.x to accelerate. download here: https://github.com/NVIDIA/tensorflow#install thanks to @allenyllee.

serdarildercaglar commented 2 years ago

Problem fixed after installed !pip install nvidia-pyindex !pip install nvidia-tensorflow

Fay-why commented 2 years ago

hi @mfshiu, NVIDIA maintains its own version of tensorflow 1.15 here: https://github.com/NVIDIA/tensorflow#install , which support latest gpu card.

So, you need to remove official tensorflow which installed through pip or conda, and install nvidia's version, as its README.md says:

install the NVIDIA wheel index:
$ pip install --user nvidia-pyindex
install the current NVIDIA Tensorflow release:
$ pip install --user nvidia-tensorflow[horovod]
after installed, just use it as regular tensorflow:
import tensorflow as tf

It works for me!!! Thanks a lot~ The tf version of NVIDA is 1.15, but luckily my codes can run successfully on tf==1.15~ Btw,my error environment are "tf==1.12.0, 3090, cuda==9.0, ubuntu20.04".

Fay-why commented 2 years ago

Problem fixed after installed !pip install nvidia-pyindex !pip install nvidia-tensorflow

Thanks! It works for me~

qingjiesjtu commented 1 year ago

Cool!! It fixes perfectly my issue! Thanks!

Guo986 commented 1 year ago

Yes! Yes!!! Remove official tensorflow. Python3.8

pip install nvidia-pyindex
pip install nvidia-tensorflow

I used A6000, tf1.15, cuda10.0.130, cudnn7.3.1, and TF website let me use python 3.6 or 3.7, that's what I did before. But!!! For using nvidia-pyindex and nvidia-tensorflow, I need to change python to 3.8. And I succeed!!!

wowo68 commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。你好，我最近正在休假中，无法亲自回复你的邮件。我将在假期结束后，尽快给你回复。

zhang159560293 commented 6 months ago

hi @mfshiu, NVIDIA maintains its own version of tensorflow 1.15 here: https://github.com/NVIDIA/tensorflow#install , which support latest gpu card.你好，NVIDIA在这里维护自己的tensorflow 1.15版本：https://github.com/NVIDIA/tensorflow#install，它支持最新的gpu卡。

So, you need to remove official tensorflow which installed through pip or conda, and install nvidia's version, as its README.md says:因此，您需要删除通过pip或conda安装的官方tensorflow，并安装nvidia的版本，如其README.md所述：

install the NVIDIA wheel index:安装 NVIDIA 轮索引：
$ pip install --user nvidia-pyindex
install the current NVIDIA Tensorflow release:安装当前的 NVIDIA Tensorflow 版本：
$ pip install --user nvidia-tensorflow[horovod]
after installed, just use it as regular tensorflow:安装后，只需将其用作常规张量流即可：
import tensorflow as tf

Thanks! Very Thanks! It has solved my problems. InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[128,3,3], b.shape=[128,3,3], m=3, n=3, k=3, batch_size=128 [[node rotation/MatMul_1 ...... = BatchMatMul[T=DT_DOUBLE, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rotation/concat_7, rotation/concat_7)]] [[{{node gradients/decoder/dgcnn_trans_fc1/MatMul_grad/tuple/control_dependency_1/_171}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge2202...pendency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

By the way, my device is A6000 and 4090 all have this problem, and now solved it , my tensorflow is 1.12.0. cuda is 9.0

wowo68 commented 6 months ago

这是来自QQ邮箱的假期自动回复邮件。你好，我最近正在休假中，无法亲自回复你的邮件。我将在假期结束后，尽快给你回复。

qqwweee / keras-yolo3

failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED #332