您好，tensorrt c++代码：关键点为2的数据集合训练后导出onnx，c++推理报cuda failure 700错误，请问可能是什么原因？

we0091234 / yolov7_plate

yolov7 车牌检测车牌识别中文车牌识别检测支持双层车牌支持12种中文车牌

360 stars 75 forks source link

您好，tensorrt c++代码：关键点为2的数据集合训练后导出onnx，c++推理报cuda failure 700错误，请问可能是什么原因？ #3

Open newforrestgump001 opened 1 year ago

newforrestgump001 commented 1 year ago

您好，tensorrt部分的代码关键点的个数为4的时候正常运行，但是关键点为2的数据集合训练后导出onnx，c++推理报cuda failure 700错误，请问可能是什么原因？非常感谢！

we0091234 commented 1 year ago

您好，tensorrt部分的代码关键点的个数为4的时候正常运行，但是关键点为2的数据集合训练后导出onnx，c++推理报cuda failure 700错误，请问可能是什么原因？非常感谢！

你给个onnx给我试下，加群，发我

newforrestgump001 commented 1 year ago

请问群号是多少，非常感谢！

we0091234 commented 1 year ago

请问群号是多少，非常感谢！

871797331 qq群

newforrestgump001 commented 1 year ago

链接: https://pan.baidu.com/s/1pfnonWjCKXmCsMfbUl88Ww 密码: g8aj --来自百度网盘超级会员V8的分享

newforrestgump001 commented 1 year ago

请问这样直接分享可以吗我还在Linux上，随后我再加群，多谢您！

we0091234 commented 1 year ago

请问这样直接分享可以吗我还在Linux上，随后我再加群，多谢您！

onnx 推理试过吗，结果对吗

newforrestgump001 commented 1 year ago

推理报错，报错的地方在[02/23/2023-19:18:43] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.1.0 [02/23/2023-19:18:43] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.1.0 [convolutionRunner.cpp::executeConv::511] Error Code 1: Cudnn (CUDNN_STATUS_EXECUTION_FAILED) void doInference_cu(IExecutionContext &context, cudaStream_t &stream, void *buffers, float output, int batchSize, int OUTPUT_SIZE) { // infer on the batch asynchronously, and DMA output back to host context.enqueue(batchSize, buffers, stream, nullptr); CHECK(cudaMemcpyAsync(output, buffers[1], batchSize OUTPUT_SIZE sizeof(float), cudaMemcpyDeviceToHost, stream));

we0091234 commented 1 year ago

推理报错，报错的地方在[02/23/2023-19:18:43] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.1.0 [02/23/2023-19:18:43] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.1.0 [convolutionRunner.cpp::executeConv::511] Error Code 1: Cudnn (CUDNN_STATUS_EXECUTION_FAILED) void doInference_cu(IExecutionContext &context, cudaStream_t &stream, void *buffers, float output, int batchSize, int OUTPUT_SIZE) { // infer on the batch asynchronously, and DMA output back to host context.enqueue(batchSize, buffers, stream, nullptr); CHECK(cudaMemcpyAsync(output, buffers[1], batchSize OUTPUT_SIZE sizeof(float), cudaMemcpyDeviceToHost, stream));

我说的是onnxruntime 推理

newforrestgump001 commented 1 year ago

还没有测试。同样的训练代码和同样的推理代码，类别数都为2，一个关键点个数为4，一个关键带你个数为2，前者可以，后者报错，我现在测试一下onnxruntime.

newforrestgump001 commented 1 year ago

四个关键点的模型参数为：[info][simple_yolo.cu:2281]:Input shape is 1 x 3 x 1024 x 1024 [info][simple_yolo.cu:2283]:Set max workspace size = 1024.00 MB [info][simple_yolo.cu:2286]:Network has 1 inputs: [info][simple_yolo.cu:2292]: 0.[images] shape is 1 x 3 x 1024 x 1024 [info][simple_yolo.cu:2298]:Network has 1 outputs: [info][simple_yolo.cu:2303]: 0.[output] shape is 1 x 64512 x 19 [info][simple_yolo.cu:2307]:Network has 1326 layers

newforrestgump001 commented 1 year ago

两类关键点的模型参数为[info][simple_yolo.cu:2281]:Input shape is 1 x 3 x 1024 x 1024 [info][simple_yolo.cu:2283]:Set max workspace size = 1024.00 MB [info][simple_yolo.cu:2286]:Network has 1 inputs: [info][simple_yolo.cu:2292]: 0.[images] shape is 1 x 3 x 1024 x 1024 [info][simple_yolo.cu:2298]:Network has 1 outputs: [info][simple_yolo.cu:2303]: 0.[output] shape is 1 x 64512 x 13 [info][simple_yolo.cu:2307]:Network has 1326 layers

newforrestgump001 commented 1 year ago

唯一个区别在于输出维数的第三维：19和13，但是doInference_cu这个函数跑就是有区别。

we0091234 commented 1 year ago

唯一个区别在于输出维数的第三维：19和13，但是doInference_cu这个函数跑就是有区别。

等会加我说吧，要给我张图，我测试下，跑一下看看

newforrestgump001 commented 1 year ago

好的，多谢您！