yeyupiaoling / PPASR

基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
Apache License 2.0
804 stars 128 forks source link

train.py 报错缺少cudnn #75

Closed a00147600 closed 2 years ago

a00147600 commented 2 years ago

OSError: (External) CUDNN error(8), CUDNN_STATUS_EXECUTION_FAILED. [Hint: 'CUDNN_STATUS_EXECUTION_FAILED'. The GPU program failed to execute. This is usually caused by a failure to launch some cuDNN kernel on the GPU, which can occur for multiple reasons. To correct, check that the hardware, an appropriate version of the driver, and the cuDNN library are correctly installed. Otherwise, this may indicate an internal error/bug in the library. ] (at ../paddle/phi/kernels/gpudnn/conv_kernel.cu:379) [operator < conv2d > error] 这段文字貌似提醒我没有cudnn模块,同时还有一大段以下文字。 Error: ../paddle/phi/kernels/funcs/elementwise_functor.h:545 Assertion b != 0 failed. InvalidArgumentError: Integer division by zero encountered in (floor) divide. Please check the input value. Error: ../paddle/phi/kernels/funcs/elementwise_functor.h:545 Assertion b != 0 failed. InvalidArgumentError: Integer division by zero encountered in (floor) divide. Please check the input value. Error: ../paddle/phi/kernels/funcs/elementwise_functor.h:545 Assertion b != 0 failed. InvalidArgumentError: Integer division by zero encountered in (floor) divide. Please check the input value. Error: ../paddle/phi/kernels/funcs/elementwise_functor.h:545 Assertion b != 0 failed. InvalidArgumentError: Integer division by zero encountered in (floor) divide. Please check the input value. Error: ../paddle/phi/kernels/funcs/elementwise_functor.h:545 Assertion b != 0 failed. InvalidArgumentError: Integer division by zero encountered in (floor) divide. Please check the input value. Error: ../paddle/phi/kernels/funcs/elementwise_functor.h:545 Assertion b != 0 failed. InvalidArgumentError: Integer division by zero encountered in (floor) divide. Please check the input value. 是不是同一个问题 我是win环境

yeyupiaoling commented 2 years ago

应该是你没有安装CUDNN,或者安装的CUDNN版本不对。你可以用conda安装。命令如下:

conda install paddlepaddle-gpu==2.3.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
a00147600 commented 2 years ago

谢谢回复 我将命令安装后 执行训练如下 (about_ppasr) F:\PPASR-master>python train.py

====================================================================== Traceback (most recent call last): File "train.py", line 54, in trainer.train(batch_size=args.batch_size, File "F:\PPASR-master\ppasr\trainer.py", line 371, in train c, l = self.test(model, test_loader, test_dataset.vocab_list, ctc_loss) File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), *kw) File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\paddle\fluid\dygraph\base.py", line 354, in _decorate_function return func(args, **kwargs) File "F:\PPASR-master\ppasr\trainer.py", line 404, in test outs, out_lens = model(inputs, input_lens) File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in call return self._dygraph_call_func(*inputs, kwargs) File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, *kwargs) File "F:\PPASR-master\ppasr\model_utils\deepspeech2\model.py", line 49, in forward x, final_chunk_state_h_box = self.rnn(x, x_lens, init_state_h_box) # [B, T, D] [num_rnn_layers, B, rnn_size] File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in call return self._dygraph_call_func(inputs, kwargs) File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, kwargs) File "F:\PPASR-master\ppasr\model_utils\deepspeech2\rnn.py", line 51, in forward x, final_state = self.rnn[i](x, x_lens, init_state_list[i]) File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in call return self._dygraph_call_func(*inputs, *kwargs) File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func outputs = self.forward(inputs, kwargs) File "F:\PPASR-master\ppasr\model_utils\deepspeech2\rnn.py", line 16, in forward x, final_state = self.rnn(x, init_state, x_lens) # [B, T, D] File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in call return self._dygraph_call_func(*inputs, *kwargs) File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func outputs = self.forward(inputs, **kwargs) File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\paddle\nn\layer\rnn.py", line 1077, in forward return self._cudnn_impl(inputs, initial_states, sequence_length) File "C:\ProgramData\Anaconda3\envs\about_ppasr\lib\site-packages\paddle\nn\layer\rnn.py", line 1012, in _cudnnimpl , _, out, state = _C_ops.rnn( OSError: (External) CUDNN error(3), CUDNN_STATUS_BAD_PARAM. [Hint: 'CUDNN_STATUS_BAD_PARAM'. An incorrect value or parameter was passed to the function. To correct, ensure that all the parameters being passed have valid values. ] (at ..\paddle/fluid/platform/device/gpu/cuda/cudnn_helper.h:287) [operator < rnn > error]

a00147600 commented 2 years ago

应该是你没有安装CUDNN,或者安装的CUDNN版本不对。你可以用conda安装。命令如下:

conda install paddlepaddle-gpu==2.3.0 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/

谢谢回复 我执行命令后 执行train.py 依然有不正确参数错误 OSError: (External) CUDNN error(3), CUDNN_STATUS_BAD_PARAM. [Hint: 'CUDNN_STATUS_BAD_PARAM'. An incorrect value or parameter was passed to the function. To correct, ensure that all the parameters being passed have valid values. ] (at ..\paddle/fluid/platform/device/gpu/cuda/cudnn_helper.h:287) [operator < rnn > error]

yeyupiaoling commented 2 years ago

你的显卡是什么型号?其他程序正常吗?

a00147600 commented 2 years ago

你的显卡是什么型号?其他程序正常吗?

显卡是3090 装的CUDA版本11.7 create.py 和 infer.py是正常执行的。 (about_ppasr) F:\PPASR-master>nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0

yeyupiaoling commented 2 years ago

哦,30系列的要cuda11以上的,安装这个吧

conda install paddlepaddle-gpu==2.3.0 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge 
a00147600 commented 2 years ago

哦,30系列的要cuda11以上的,安装这个吧

conda install paddlepaddle-gpu==2.3.0 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge 

我执行了 然后出现以下问题 我是否需要重新装一遍环境。回到了cudnn的问题 Error: ../paddle/phi/kernels/funcs/elementwise_functor.h:545 Assertion b != 0 failed. InvalidArgumentError: Integer division by zero encountered in (floor) divide. Please check the input value. Error: ../paddle/phi/kernels/funcs/elementwise_functor.h:545 Assertion b != 0 failed. InvalidArgumentError: Integer division by zero encountered in (floor) divide. Please check the input value. Error: ../paddle/phi/kernels/funcs/elementwise_functor.h:545 Assertion b != 0 failed. InvalidArgumentError: Integer division by zero encountered in (floor) divide. Please check the input value. OSError: (External) CUDNN error(8), CUDNN_STATUS_EXECUTION_FAILED. [Hint: 'CUDNN_STATUS_EXECUTION_FAILED'. The GPU program failed to execute. This is usually caused by a failure to launch some cuDNN kernel on the GPU, which can occur for multiple reasons. To correct, check that the hardware, an appropriate version of the driver, and the cuDNN library are correctly installed. Otherwise, this may indicate an internal error/bug in the library. ] (at ../paddle/phi/kernels/gpudnn/conv_kernel.cu:379) [operator < conv2d > error]

yeyupiaoling commented 2 years ago

你试试这个能否通过检查

paddle.utils.run_check()
a00147600 commented 2 years ago

你试试这个能否通过检查

paddle.utils.run_check()

可以的 (about_ppasr) F:\PPASR-master>python Python 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32 Type "help", "copyright", "credits" or "license" for more information.

import paddle paddle.utils.run_check() Running verify PaddlePaddle program ... W0607 11:49:30.111837 76260 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.2 W0607 11:49:30.117821 76260 gpu_context.cc:306] device: 0, cuDNN Version: 8.2. PaddlePaddle works well on 1 GPU. PaddlePaddle works well on 1 GPUs. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

yeyupiaoling commented 2 years ago

这就奇怪了,你运行过其他的Paddle项目吗,有没有报这个错误?

a00147600 commented 2 years ago

这就奇怪了,你运行过其他的Paddle项目吗,有没有报这个错误?

不好意思 您这个项目是我第一次接触paddle 我在想是不是我的环境出现了重复导入的问题 在想要不要重来一遍。。

yeyupiaoling commented 2 years ago

你用conda 重新创建一个虚拟环境,Python为3.7的试试

a00147600 commented 2 years ago

你用conda 重新创建一个虚拟环境,Python为3.7的试试

重建3.7环境后 安装命令 python setup.py install python -m pip install ppasr -U conda install paddlepaddle-gpu==2.3.0 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge

之后 执行 train以及其他文件出现 OSError: sndfile library not found 看了下解决办法基本围绕linux 之前第一次安装并没有出现这个问题 OSError: cannot load library 'C:\ProgramData\Anaconda3\envs\ppasr\lib\site-packages\soundfile-0.10.3.post1-py3.7.egg_soundfile_data\libsndfile64bit.dll': error 0x7e

yeyupiaoling commented 2 years ago

https://blog.csdn.net/byna11sina11/article/details/109676427

a00147600 commented 2 years ago

https://blog.csdn.net/byna11sina11/article/details/109676427 谢谢 这篇文章解决了这个问题。我回到之前的train.py 熟悉的报错环节了 对这个问题 我现在在看这篇文章不知道有没有帮助 另外这个conda下载较慢 https://blog.csdn.net/u013128836/article/details/101636736?ops_request_misc=&request_id=&biz_id=102&utm_term=The%20GPU%20program%20failed%20to%20exec&utm_medium=distribute.pc_search_result.none-task-blog-2~all~sobaiduweb~default-1-101636736.142^v11^pc_search_result_control_group,157^v13^new_style2&spm=1018.2226.3001.4187

OSError: (External) CUDNN error(8), CUDNN_STATUS_EXECUTION_FAILED. [Hint: 'CUDNN_STATUS_EXECUTION_FAILED'. The GPU program failed to execute. This is usually caused by a failure to launch some cuDNN kernel on the GPU, which can occur for multiple reasons. To correct, check that the hardware, an appropriate version of the driver, and the cuDNN library are correctly installed. Otherwise, this may indicate an internal error/bug in the library. ] (at ../paddle/phi/kernels/gpudnn/conv_kernel.cu:379) [operator < conv2d > error]

a00147600 commented 2 years ago

https://blog.csdn.net/byna11sina11/article/details/109676427

不好意思又打扰了 烦请了解下您的显卡 nvidia版本 cuda cudnn版本好吗? 我似乎找到了自己出错的原因 跟这位博主的评论区类似 30系显卡的问题 https://blog.csdn.net/hunterflyy/article/details/108369274?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522165475452316781483711035%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=165475452316781483711035&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~first_rank_ecpm_v1~rank_v31_ecpm-1-108369274-null-null.142^v11^pc_search_result_control_group,157^v13^new_style2&utm_term=unspecified+launch+failure+719&spm=1018.2226.3001.4187

yeyupiaoling commented 2 years ago

我的是2080ti 驱动版本510 CUDA 10.2 cudnn 7.6.5

a00147600 commented 2 years ago

我的是2080ti 驱动版本510 CUDA 10.2 cudnn 7.6.5

另一台电脑成功了 GTX1650 CUDA 10.2 cudnn 7.6.5环境 初步认为是30系显卡不适配的问题。

谢谢帮助

yeyupiaoling commented 2 years ago

30系显卡的驱动好像不能低于450还是470