we0091234 / yolov7_plate

yolov7 车牌检测 车牌识别 中文车牌识别 检测 支持双层车牌 支持12种中文车牌
360 stars 75 forks source link

您好,tensorrt c++代码:关键点为2的数据集合训练后导出onnx,c++推理报cuda failure 700错误,请问可能是什么原因? #3

Open newforrestgump001 opened 1 year ago

newforrestgump001 commented 1 year ago

您好,tensorrt部分的代码关键点的个数为4的时候正常运行,但是关键点为2的数据集合训练后导出onnx,c++推理报cuda failure 700错误,请问可能是什么原因?非常感谢!

we0091234 commented 1 year ago

您好,tensorrt部分的代码关键点的个数为4的时候正常运行,但是关键点为2的数据集合训练后导出onnx,c++推理报cuda failure 700错误,请问可能是什么原因?非常感谢!

你给个onnx给我试下,加群,发我

newforrestgump001 commented 1 year ago

请问群号是多少,非常感谢!

we0091234 commented 1 year ago

请问群号是多少,非常感谢!

871797331 qq群

newforrestgump001 commented 1 year ago

链接: https://pan.baidu.com/s/1pfnonWjCKXmCsMfbUl88Ww 密码: g8aj --来自百度网盘超级会员V8的分享

newforrestgump001 commented 1 year ago

请问这样直接分享可以吗 我还在Linux上,随后我再加群,多谢您!

we0091234 commented 1 year ago

请问这样直接分享可以吗 我还在Linux上,随后我再加群,多谢您!

onnx 推理试过吗,结果对吗

newforrestgump001 commented 1 year ago

推理报错,报错的地方在[02/23/2023-19:18:43] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.1.0 [02/23/2023-19:18:43] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.1.0 [convolutionRunner.cpp::executeConv::511] Error Code 1: Cudnn (CUDNN_STATUS_EXECUTION_FAILED) void doInference_cu(IExecutionContext &context, cudaStream_t &stream, void *buffers, float output, int batchSize, int OUTPUT_SIZE) { // infer on the batch asynchronously, and DMA output back to host context.enqueue(batchSize, buffers, stream, nullptr); CHECK(cudaMemcpyAsync(output, buffers[1], batchSize OUTPUT_SIZE sizeof(float), cudaMemcpyDeviceToHost, stream));

we0091234 commented 1 year ago

推理报错,报错的地方在[02/23/2023-19:18:43] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.1.0 [02/23/2023-19:18:43] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.1.0 [convolutionRunner.cpp::executeConv::511] Error Code 1: Cudnn (CUDNN_STATUS_EXECUTION_FAILED) void doInference_cu(IExecutionContext &context, cudaStream_t &stream, void *buffers, float output, int batchSize, int OUTPUT_SIZE) { // infer on the batch asynchronously, and DMA output back to host context.enqueue(batchSize, buffers, stream, nullptr); CHECK(cudaMemcpyAsync(output, buffers[1], batchSize OUTPUT_SIZE sizeof(float), cudaMemcpyDeviceToHost, stream));

我说的是onnxruntime 推理

newforrestgump001 commented 1 year ago

还没有测试。同样的训练代码和同样的推理代码,类别数都为2,一个关键点个数为4,一个关键带你个数为2,前者可以,后者报错,我现在测试一下onnxruntime.

newforrestgump001 commented 1 year ago

四个关键点的模型参数为:[info][simple_yolo.cu:2281]:Input shape is 1 x 3 x 1024 x 1024 [info][simple_yolo.cu:2283]:Set max workspace size = 1024.00 MB [info][simple_yolo.cu:2286]:Network has 1 inputs: [info][simple_yolo.cu:2292]: 0.[images] shape is 1 x 3 x 1024 x 1024 [info][simple_yolo.cu:2298]:Network has 1 outputs: [info][simple_yolo.cu:2303]: 0.[output] shape is 1 x 64512 x 19 [info][simple_yolo.cu:2307]:Network has 1326 layers

newforrestgump001 commented 1 year ago

两类关键点的模型参数为[info][simple_yolo.cu:2281]:Input shape is 1 x 3 x 1024 x 1024 [info][simple_yolo.cu:2283]:Set max workspace size = 1024.00 MB [info][simple_yolo.cu:2286]:Network has 1 inputs: [info][simple_yolo.cu:2292]: 0.[images] shape is 1 x 3 x 1024 x 1024 [info][simple_yolo.cu:2298]:Network has 1 outputs: [info][simple_yolo.cu:2303]: 0.[output] shape is 1 x 64512 x 13 [info][simple_yolo.cu:2307]:Network has 1326 layers

newforrestgump001 commented 1 year ago

唯一个区别在于输出维数的第三维:19和13,但是doInference_cu这个函数跑就是有区别。

we0091234 commented 1 year ago

唯一个区别在于输出维数的第三维:19和13,但是doInference_cu这个函数跑就是有区别。

等会加我说吧,要给我张图,我测试下,跑一下看看

newforrestgump001 commented 1 year ago

好的,多谢您!