zerollzeng / tensorrt-zoo

openpose, yolov3 with tiny-tensorrt
86 stars 25 forks source link

openpose运行结果出错 #17

Closed wulixunhua closed 4 years ago

wulixunhua commented 4 years ago

image

你好, 我按照你的openpose运行,结果是这样的,全图都是点,请问这是什么原因啊?

zerollzeng commented 4 years ago

did you clone the latest tensorrt-zoo or did you modify the code? I can get correct output in my machine. if it still doesn't work could you please post more information, eg with issue template. I can not help you with only a screenshot

wulixunhua commented 4 years ago

@zerollzeng 我是在jetson平台(TensorRT 6.0 , opencv 4.1.1 , CUDA10.0)上编译的, 编译成功,运行后生成引擎没用问题,测试图片的时候报下面的错误: image 然后我把PReLUPlugin.cu中第183行的 CUDA_CHECK(...) 这一行给注释掉了,然后重新编译,运行测试图片,得到的就是最上面的结果图了。这是哪里出错了 还有就是你提供的模型下载的链接点不开

zerollzeng commented 4 years ago

the download link is broken. thanks for mention it. the reason why you can not compile this on jetson is because jeston does not support fp16, try comment fp16 relation part in preluplugin.h and preluplugin.cu

wulixunhua commented 4 years ago

@zerollzeng 我没太懂, 你的意思是把PReLUPlugin.cu 和 PReLUPlugin.h 里面 fp16 换成 __half 吗? jetson agx 我运行darknet 的yolov3 tensorrt, 是支持fp16的啊

zerollzeng commented 4 years ago

https://github.com/zerollzeng/tiny-tensorrt/blob/69e64d10270b518c1356b50d9dff1553e3d8c1ce/cmake/CUDA_utils.cmake#L17

some operator in preluplugin need device that sm arch greater than 6.0, take a look at here: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ does jetson satisfy these requirement?

I mean if your device DO NOT SATISFY THESE REQUIREMENTS, then you need to disable fp16 support in prelu plugin.

wulixunhua commented 4 years ago

@zerollzeng 你好,可以讲一下怎么关闭PReluPlugin中的fp16支持吗?我没搞出来

zerollzeng commented 4 years ago

Can you post the output of cmake, make, and running logs? make sure wrap within appropriate markdown block

wulixunhua commented 4 years ago

@zerollzeng 信息我贴在下面了,板子是jetson_agx, 它的是 CUDA sm72。

CmakeLists:

cmake_minimum_required(VERSION 3.0)

project(tensorrt_zoo)
set(CMAKE_CXX_FLAGS "-std=c++11")

set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${PROJECT_SOURCE_DIR}/lib)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${PROJECT_SOURCE_DIR}/bin)

find_package(CUDA REQUIRED)
include_directories(${CUDA_INCLUDE_DIRS})

find_package(OpenCV REQUIRED)
if (NOT OpenCV_FOUND)
    message(FATAL_ERROR "opencv not found")
endif (NOT OpenCV_FOUND)

include_directories(tiny-tensorrt)
link_directories(/usr/lib/aarch64-linux-gnu)
link_directories(lib/)

find_library(CUDART cudart HINTS /usr/local/cuda/targets/aarch64-linux/lib/)

add_subdirectory(tiny-tensorrt)

#add_subdirectory(yolov3)

add_subdirectory(openpose)

Make:

nvidia@nvidia-desktop:~/Downloads/tensorrt-zoo/build$ make -j16
[  5%] Building NVCC (Device) object tiny-tensorrt/CMakeFiles/tinytrt.dir/plugin/YoloLayerPlugin/tinytrt_generated_YoloLayerPlugin.cu.o
[ 11%] Building NVCC (Device) object tiny-tensorrt/CMakeFiles/tinytrt.dir/plugin/PReLUPlugin/tinytrt_generated_PReLUPlugin.cu.o
[ 16%] Building NVCC (Device) object tiny-tensorrt/CMakeFiles/tinytrt.dir/plugin/UpSamplePlugin/tinytrt_generated_UpSamplePlugin.cu.o
Scanning dependencies of target tinytrt
[ 22%] Building CXX object tiny-tensorrt/CMakeFiles/tinytrt.dir/Int8EntropyCalibrator.cpp.o
[ 27%] Building CXX object tiny-tensorrt/CMakeFiles/tinytrt.dir/Trt.cpp.o
[ 33%] Building CXX object tiny-tensorrt/CMakeFiles/tinytrt.dir/plugin/PluginFactory.cpp.o
[ 38%] Building CXX object tiny-tensorrt/CMakeFiles/tinytrt.dir/plugin/plugin_utils.cpp.o
In file included from /home/nvidia/Downloads/tensorrt-zoo/tiny-tensorrt/plugin/plugin_utils.cpp:8:0:
/home/nvidia/Downloads/tensorrt-zoo/tiny-tensorrt/plugin/plugin_utils.h:28:20: warning: ‘G_PLUGIN_VERSION’ defined but not used [-Wunused-variable]
 static const char* G_PLUGIN_VERSION = "1";
                    ^~~~~~~~~~~~~~~~
/home/nvidia/Downloads/tensorrt-zoo/tiny-tensorrt/plugin/plugin_utils.h:27:20: warning: ‘G_PLUGIN_NAMESPACE’ defined but not used [-Wunused-variable]
 static const char* G_PLUGIN_NAMESPACE = "_TRT";
                    ^~~~~~~~~~~~~~~~~~
In file included from /home/nvidia/Downloads/tensorrt-zoo/tiny-tensorrt/plugin/YoloLayerPlugin/YoloLayerPlugin.hpp:19:0,
                 from /home/nvidia/Downloads/tensorrt-zoo/tiny-tensorrt/plugin/PluginFactory.cpp:12:
/home/nvidia/Downloads/tensorrt-zoo/tiny-tensorrt/./plugin/plugin_utils.h:28:20: warning: ‘G_PLUGIN_VERSION’ defined but not used [-Wunused-variable]
 static const char* G_PLUGIN_VERSION = "1";
                    ^~~~~~~~~~~~~~~~
/home/nvidia/Downloads/tensorrt-zoo/tiny-tensorrt/./plugin/plugin_utils.h:27:20: warning: ‘G_PLUGIN_NAMESPACE’ defined but not used [-Wunused-variable]
 static const char* G_PLUGIN_NAMESPACE = "_TRT";
                    ^~~~~~~~~~~~~~~~~~
[ 44%] Linking CXX shared library ../../lib/libtinytrt.so
[ 44%] Built target tinytrt
[ 50%] Building NVCC (Device) object openpose/CMakeFiles/testopenpose.dir/testopenpose_generated_ResizeAndMerge.cu.o
[ 55%] Building NVCC (Device) object openpose/CMakeFiles/testopenpose.dir/testopenpose_generated_OpenPose.cu.o
[ 61%] Building NVCC (Device) object openpose/CMakeFiles/testopenpose.dir/testopenpose_generated_BodyPartConnector.cu.o
Scanning dependencies of target pytrt
[ 66%] Building NVCC (Device) object openpose/CMakeFiles/testopenpose.dir/testopenpose_generated_PoseNMS.cu.o
[ 72%] Building CXX object tiny-tensorrt/CMakeFiles/pytrt.dir/PyTrt.cpp.o
Scanning dependencies of target testopenpose
[ 77%] Building CXX object openpose/CMakeFiles/testopenpose.dir/Array.cpp.o
[ 83%] Building CXX object openpose/CMakeFiles/testopenpose.dir/Point.cpp.o
[ 88%] Building CXX object openpose/CMakeFiles/testopenpose.dir/testopenpose.cpp.o
[ 94%] Linking CXX shared library ../../lib/pytrt.cpython-36m-aarch64-linux-gnu.so
/home/nvidia/Downloads/tensorrt-zoo/openpose/testopenpose.cpp: In function ‘int main(int, char**)’:
/home/nvidia/Downloads/tensorrt-zoo/openpose/testopenpose.cpp:91:36: warning: conversion to ‘__gnu_cxx::__alloc_traits<std::allocator<float> >::value_type {aka float}’ alters ‘double’ constant value [-Wfloat-conversion]
             calibratorData[i][j] = 0.05;
                                    ^~~~
/home/nvidia/Downloads/tensorrt-zoo/openpose/testopenpose.cpp:115:59: warning: conversion to ‘int’ from ‘__gnu_cxx::__alloc_traits<std::allocator<float> >::value_type {aka float}’ may alter its value [-Wfloat-conversion]
         cv::circle(img,cv::Point(result[i*3],result[i*3+1]),2,cv::Scalar(0,255,0),-1);
                                                           ^
/home/nvidia/Downloads/tensorrt-zoo/openpose/testopenpose.cpp:115:59: warning: conversion to ‘int’ from ‘__gnu_cxx::__alloc_traits<std::allocator<float> >::value_type {aka float}’ may alter its value [-Wfloat-conversion]
[100%] Linking CXX executable ../../bin/testopenpose
[100%] Built target testopenpose
[100%] Built target pytrt

Runing :

nvidia@nvidia-desktop:~/Downloads/tensorrt-zoo/bin$ ./testopenpose --prototxt ../openpose/pose_deploy.prototxt --caffemodel ../openpose/pose_iter_584000.caffemodel --save_engine ../openpose/save_engine --input ../test.jpg --run_mode 0
usage: path/to/testopenpose --prototxt path/to/prototxt --caffemodel path/to/caffemodel/ --save_engine path/to/save_engin --input path/to/input/img --run_mode 0/1/2
[2020-04-30 11:50:53.074] [info] create plugin factory
[2020-04-30 11:50:53.075] [info] yolo3 params: class: 1, netSize: 416 
[2020-04-30 11:50:53.075] [info] upsample params: scale: 2
[2020-04-30 11:50:53.075] [info] prototxt: ../openpose/pose_deploy.prototxt
[2020-04-30 11:50:53.075] [info] caffeModel: ../openpose/pose_iter_584000.caffemodel
[2020-04-30 11:50:53.075] [info] engineFile: ../openpose/save_engine
[2020-04-30 11:50:53.075] [info] outputBlobName: 
net_output 
[2020-04-30 11:50:53.075] [info] build caffe engine with ../openpose/pose_deploy.prototxt and ../openpose/pose_iter_584000.caffemodel
[2020-04-30 11:50:54.312] [info] Number of network layers: 261
[2020-04-30 11:50:54.312] [info] Number of input: 
Input layer: 
image : 3x480x640 
[2020-04-30 11:50:54.312] [info] Number of output: 1
Output layer: 
net_output : 78x60x80 
[2020-04-30 11:50:54.312] [info] parse network done
[2020-04-30 11:50:54.312] [info] fp16 support: true
[2020-04-30 11:50:54.312] [info] int8 support: true
[2020-04-30 11:50:54.313] [info] Max batchsize: 1
[2020-04-30 11:50:54.313] [info] Max workspace size: 10485760
[2020-04-30 11:50:54.313] [info] Number of DLA core: 2
[2020-04-30 11:50:54.313] [info] Max DLA batchsize: 32
[2020-04-30 11:50:54.313] [info] Current use DLA core: 0
[2020-04-30 11:50:54.313] [info] build engine...

--------------- Layers running on DLA: 

--------------- Layers running on GPU: 
conv1_1 + relu1_1, conv1_2 + relu1_2, pool1_stage1, conv2_1 + relu2_1, conv2_2 + relu2_2, pool2_stage1, conv3_1 + relu3_1, conv3_2 + relu3_2, conv3_3 + relu3_3, conv3_4 + relu3_4, pool3_stage1, conv4_1 + relu4_1, conv4_2, prelu4_2, conv4_3_CPM, prelu4_3_CPM, conv4_4_CPM, prelu4_4_CPM, Mconv1_stage0_L2_0, Mprelu1_stage0_L2_0, Mconv1_stage0_L2_1, Mprelu1_stage0_L2_1, Mconv1_stage0_L2_2, Mprelu1_stage0_L2_2, Mconv1_stage0_L2_0 copy, Mconv1_stage0_L2_1 copy, Mconv1_stage0_L2_2 copy, Mconv2_stage0_L2_0, Mprelu2_stage0_L2_0, Mconv2_stage0_L2_1, Mprelu2_stage0_L2_1, Mconv2_stage0_L2_2, Mprelu2_stage0_L2_2, Mconv2_stage0_L2_0 copy, Mconv2_stage0_L2_1 copy, Mconv2_stage0_L2_2 copy, Mconv3_stage0_L2_0, Mprelu3_stage0_L2_0, Mconv3_stage0_L2_1, Mprelu3_stage0_L2_1, Mconv3_stage0_L2_2, Mprelu3_stage0_L2_2, Mconv3_stage0_L2_0 copy, Mconv3_stage0_L2_1 copy, Mconv3_stage0_L2_2 copy, Mconv4_stage0_L2_0, Mprelu4_stage0_L2_0, Mconv4_stage0_L2_1, Mprelu4_stage0_L2_1, Mconv4_stage0_L2_2, Mprelu4_stage0_L2_2, Mconv4_stage0_L2_0 copy, Mconv4_stage0_L2_1 copy, Mconv4_stage0_L2_2 copy, Mconv5_stage0_L2_0, Mprelu5_stage0_L2_0, Mconv5_stage0_L2_1, Mprelu5_stage0_L2_1, Mconv5_stage0_L2_2, Mprelu5_stage0_L2_2, Mconv5_stage0_L2_0 copy, Mconv5_stage0_L2_1 copy, Mconv5_stage0_L2_2 copy, Mconv6_stage0_L2, Mprelu6_stage0_L2, Mconv7_stage0_L2, conv4_4_CPM copy, Mconv1_stage1_L2_0, Mprelu1_stage1_L2_0, Mconv1_stage1_L2_1, Mprelu1_stage1_L2_1, Mconv1_stage1_L2_2, Mprelu1_stage1_L2_2, Mconv1_stage1_L2_0 copy, Mconv1_stage1_L2_1 copy, Mconv1_stage1_L2_2 copy, Mconv2_stage1_L2_0, Mprelu2_stage1_L2_0, Mconv2_stage1_L2_1, Mprelu2_stage1_L2_1, Mconv2_stage1_L2_2, Mprelu2_stage1_L2_2, Mconv2_stage1_L2_0 copy, Mconv2_stage1_L2_1 copy, Mconv2_stage1_L2_2 copy, Mconv3_stage1_L2_0, Mprelu3_stage1_L2_0, Mconv3_stage1_L2_1, Mprelu3_stage1_L2_1, Mconv3_stage1_L2_2, Mprelu3_stage1_L2_2, Mconv3_stage1_L2_0 copy, Mconv3_stage1_L2_1 copy, Mconv3_stage1_L2_2 copy, Mconv4_stage1_L2_0, Mprelu4_stage1_L2_0, Mconv4_stage1_L2_1, Mprelu4_stage1_L2_1, Mconv4_stage1_L2_2, Mprelu4_stage1_L2_2, Mconv4_stage1_L2_0 copy, Mconv4_stage1_L2_1 copy, Mconv4_stage1_L2_2 copy, Mconv5_stage1_L2_0, Mprelu5_stage1_L2_0, Mconv5_stage1_L2_1, Mprelu5_stage1_L2_1, Mconv5_stage1_L2_2, Mprelu5_stage1_L2_2, Mconv5_stage1_L2_0 copy, Mconv5_stage1_L2_1 copy, Mconv5_stage1_L2_2 copy, Mconv6_stage1_L2, Mprelu6_stage1_L2, Mconv7_stage1_L2, conv4_4_CPM copy, Mconv1_stage2_L2_0, Mprelu1_stage2_L2_0, Mconv1_stage2_L2_1, Mprelu1_stage2_L2_1, Mconv1_stage2_L2_2, Mprelu1_stage2_L2_2, Mconv1_stage2_L2_0 copy, Mconv1_stage2_L2_1 copy, Mconv1_stage2_L2_2 copy, Mconv2_stage2_L2_0, Mprelu2_stage2_L2_0, Mconv2_stage2_L2_1, Mprelu2_stage2_L2_1, Mconv2_stage2_L2_2, Mprelu2_stage2_L2_2, Mconv2_stage2_L2_0 copy, Mconv2_stage2_L2_1 copy, Mconv2_stage2_L2_2 copy, Mconv3_stage2_L2_0, Mprelu3_stage2_L2_0, Mconv3_stage2_L2_1, Mprelu3_stage2_L2_1, Mconv3_stage2_L2_2, Mprelu3_stage2_L2_2, Mconv3_stage2_L2_0 copy, Mconv3_stage2_L2_1 copy, Mconv3_stage2_L2_2 copy, Mconv4_stage2_L2_0, Mprelu4_stage2_L2_0, Mconv4_stage2_L2_1, Mprelu4_stage2_L2_1, Mconv4_stage2_L2_2, Mprelu4_stage2_L2_2, Mconv4_stage2_L2_0 copy, Mconv4_stage2_L2_1 copy, Mconv4_stage2_L2_2 copy, Mconv5_stage2_L2_0, Mprelu5_stage2_L2_0, Mconv5_stage2_L2_1, Mprelu5_stage2_L2_1, Mconv5_stage2_L2_2, Mprelu5_stage2_L2_2, Mconv5_stage2_L2_0 copy, Mconv5_stage2_L2_1 copy, Mconv5_stage2_L2_2 copy, Mconv6_stage2_L2, Mprelu6_stage2_L2, Mconv7_stage2_L2, conv4_4_CPM copy, Mconv1_stage3_L2_0, Mprelu1_stage3_L2_0, Mconv1_stage3_L2_1, Mprelu1_stage3_L2_1, Mconv1_stage3_L2_2, Mprelu1_stage3_L2_2, Mconv1_stage3_L2_0 copy, Mconv1_stage3_L2_1 copy, Mconv1_stage3_L2_2 copy, Mconv2_stage3_L2_0, Mprelu2_stage3_L2_0, Mconv2_stage3_L2_1, Mprelu2_stage3_L2_1, Mconv2_stage3_L2_2, Mprelu2_stage3_L2_2, Mconv2_stage3_L2_0 copy, Mconv2_stage3_L2_1 copy, Mconv2_stage3_L2_2 copy, Mconv3_stage3_L2_0, Mprelu3_stage3_L2_0, Mconv3_stage3_L2_1, Mprelu3_stage3_L2_1, Mconv3_stage3_L2_2, Mprelu3_stage3_L2_2, Mconv3_stage3_L2_0 copy, Mconv3_stage3_L2_1 copy, Mconv3_stage3_L2_2 copy, Mconv4_stage3_L2_0, Mprelu4_stage3_L2_0, Mconv4_stage3_L2_1, Mprelu4_stage3_L2_1, Mconv4_stage3_L2_2, Mprelu4_stage3_L2_2, Mconv4_stage3_L2_0 copy, Mconv4_stage3_L2_1 copy, Mconv4_stage3_L2_2 copy, Mconv5_stage3_L2_0, Mprelu5_stage3_L2_0, Mconv5_stage3_L2_1, Mprelu5_stage3_L2_1, Mconv5_stage3_L2_2, Mprelu5_stage3_L2_2, Mconv5_stage3_L2_0 copy, Mconv5_stage3_L2_1 copy, Mconv5_stage3_L2_2 copy, Mconv6_stage3_L2, Mprelu6_stage3_L2, Mconv7_stage3_L2, conv4_4_CPM copy, Mconv1_stage0_L1_0, Mprelu1_stage0_L1_0, Mconv1_stage0_L1_1, Mprelu1_stage0_L1_1, Mconv1_stage0_L1_2, Mprelu1_stage0_L1_2, Mconv1_stage0_L1_0 copy, Mconv1_stage0_L1_1 copy, Mconv1_stage0_L1_2 copy, Mconv2_stage0_L1_0, Mprelu2_stage0_L1_0, Mconv2_stage0_L1_1, Mprelu2_stage0_L1_1, Mconv2_stage0_L1_2, Mprelu2_stage0_L1_2, Mconv2_stage0_L1_0 copy, Mconv2_stage0_L1_1 copy, Mconv2_stage0_L1_2 copy, Mconv3_stage0_L1_0, Mprelu3_stage0_L1_0, Mconv3_stage0_L1_1, Mprelu3_stage0_L1_1, Mconv3_stage0_L1_2, Mprelu3_stage0_L1_2, Mconv3_stage0_L1_0 copy, Mconv3_stage0_L1_1 copy, Mconv3_stage0_L1_2 copy, Mconv4_stage0_L1_0, Mprelu4_stage0_L1_0, Mconv4_stage0_L1_1, Mprelu4_stage0_L1_1, Mconv4_stage0_L1_2, Mprelu4_stage0_L1_2, Mconv4_stage0_L1_0 copy, Mconv4_stage0_L1_1 copy, Mconv4_stage0_L1_2 copy, Mconv5_stage0_L1_0, Mprelu5_stage0_L1_0, Mconv5_stage0_L1_1, Mprelu5_stage0_L1_1, Mconv5_stage0_L1_2, Mprelu5_stage0_L1_2, Mconv5_stage0_L1_0 copy, Mconv5_stage0_L1_1 copy, Mconv5_stage0_L1_2 copy, Mconv6_stage0_L1, Mprelu6_stage0_L1, Mconv7_stage0_L1, conv4_4_CPM copy, Mconv7_stage3_L2 copy, Mconv1_stage1_L1_0, Mprelu1_stage1_L1_0, Mconv1_stage1_L1_1, Mprelu1_stage1_L1_1, Mconv1_stage1_L1_2, Mprelu1_stage1_L1_2, Mconv1_stage1_L1_0 copy, Mconv1_stage1_L1_1 copy, Mconv1_stage1_L1_2 copy, Mconv2_stage1_L1_0, Mprelu2_stage1_L1_0, Mconv2_stage1_L1_1, Mprelu2_stage1_L1_1, Mconv2_stage1_L1_2, Mprelu2_stage1_L1_2, Mconv2_stage1_L1_0 copy, Mconv2_stage1_L1_1 copy, Mconv2_stage1_L1_2 copy, Mconv3_stage1_L1_0, Mprelu3_stage1_L1_0, Mconv3_stage1_L1_1, Mprelu3_stage1_L1_1, Mconv3_stage1_L1_2, Mprelu3_stage1_L1_2, Mconv3_stage1_L1_0 copy, Mconv3_stage1_L1_1 copy, Mconv3_stage1_L1_2 copy, Mconv4_stage1_L1_0, Mprelu4_stage1_L1_0, Mconv4_stage1_L1_1, Mprelu4_stage1_L1_1, Mconv4_stage1_L1_2, Mprelu4_stage1_L1_2, Mconv4_stage1_L1_0 copy, Mconv4_stage1_L1_1 copy, Mconv4_stage1_L1_2 copy, Mconv5_stage1_L1_0, Mprelu5_stage1_L1_0, Mconv5_stage1_L1_1, Mprelu5_stage1_L1_1, Mconv5_stage1_L1_2, Mprelu5_stage1_L1_2, Mconv5_stage1_L1_0 copy, Mconv5_stage1_L1_1 copy, Mconv5_stage1_L1_2 copy, Mconv6_stage1_L1, Mprelu6_stage1_L1, Mconv7_stage1_L1, Mconv7_stage3_L2 copy, 
Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
Detected 1 inputs and 3 output network tensors.
[2020-04-30 11:53:03.353] [info] serialize engine to ../openpose/save_engine
[2020-04-30 11:53:03.353] [info] save engine to ../openpose/save_engine...
[2020-04-30 11:53:12.705] [info] create execute context and malloc device memory...
[2020-04-30 11:53:12.705] [info] init engine...
[2020-04-30 11:53:12.740] [info] malloc device memory
nbBingdings: 2
[2020-04-30 11:53:12.741] [info] input: 
[2020-04-30 11:53:12.741] [info] binding bindIndex: 0, name: image, size in byte: 3686400
[2020-04-30 11:53:12.741] [info] binding dims with 3 dimemsion
3 x 480 x 640   
[2020-04-30 11:53:12.748] [info] output: 
[2020-04-30 11:53:12.748] [info] binding bindIndex: 1, name: net_output, size in byte: 1497600
[2020-04-30 11:53:12.748] [info] binding dims with 3 dimemsion
78 x 60 x 80   
=====>malloc extra memory for openpose...
heatmap Dims3
heatmap size: 1 78 60 80
allocate heatmap host and divice memory done
resize map size: 1 78 240 320
kernel size: 1 78 240 320
allocate kernel host and device memory done
peaks size: 1 25 128 3
allocate peaks host and device memory done
=====> malloc extra memory done
CUDA error 48 at /home/nvidia/Downloads/tensorrt-zoo/tiny-tensorrt/plugin/PReLUPlugin/PReLUPlugin.cu:177

第177行是CHECK_CUDA:

CUDA_CHECK(Forward_gpu(count, channels, dim, reinterpret_cast<const float *>(mDeviceKernel), reinterpret_cast<const float *>(inputs[0]), reinterpret_cast<float *>(outputs[0]), zerof, div_factor, stream));
zerollzeng commented 4 years ago

https://news.ycombinator.com/item?id=18389589 try Google cuda error 48, seems a hardware problem

wulixunhua commented 4 years ago

@zerollzeng 那有办法让PRluPlugin 通过 cpu 运算吗?

zerollzeng commented 4 years ago

https://github.com/zerollzeng/tiny-tensorrt/blob/master/plugin/YoloLayerPlugin/YoloLayerPlugin.cu 可以参考这里的实现

wulixunhua commented 4 years ago

@zerollzeng 抱歉再次打扰您。 我看了coco 数据集那个19关键点的模型,那个模型是不需要的PRelu这个操作的, 我把模型加载进来, 将openpose.h 中的peaks 改成了 19。这次没有报cuda的错误,但是结果图片中没有关键点输出和标记 。 请问你在你的代码中测试过19关键点的模型吗?

zerollzeng commented 4 years ago

No, I am not sure if 25-points model and 19-points models has exactly same post-processing phase.

zerollzeng commented 4 years ago

close due to inativity