zerollzeng / tensorrt-zoo

openpose, yolov3 with tiny-tensorrt
86 stars 25 forks source link

run time cuda error : CUDA error 209 at /home/nano/work/tensorrt-zoo/tiny-tensorrt/plugin/PReLUPlugin/PReLUPlugin.cu:183 #25

Closed yinguoxiangyi closed 4 years ago

yinguoxiangyi commented 4 years ago

Error info [2020-06-29 19:35:46.397] [info] create plugin factory [2020-06-29 19:35:46.397] [info] yolo3 params: class: 1, netSize: 416 [2020-06-29 19:35:46.397] [info] upsample params: scale: 2 [2020-06-29 19:35:46.397] [info] prototxt: /home/nano/work/openpose_tensorrt/models/pose/body_25/pose_deploy.prototxt [2020-06-29 19:35:46.397] [info] caffeModel: /home/nano/work/openpose_tensorrt/models/pose/body_25/pose_iter_584000.caffemodel [2020-06-29 19:35:46.397] [info] engineFile: ./pose_iter_584000_480x640.trt [2020-06-29 19:35:46.397] [info] outputBlobName: net_output [2020-06-29 19:35:46.398] [info] build caffe engine with /home/nano/work/openpose_tensorrt/models/pose/body_25/pose_deploy.prototxt and /home/nano/work/openpose_tensorrt/models/pose/body_25/pose_iter_584000.caffemodel [2020-06-29 19:35:49.240] [info] Number of network layers: 261 [2020-06-29 19:35:49.240] [info] Number of input: Input layer: image : 3x480x640 [2020-06-29 19:35:49.240] [info] Number of output: 1 Output layer: net_output : 78x60x80 [2020-06-29 19:35:49.240] [info] parse network done [2020-06-29 19:35:49.240] [info] fp16 support: true [2020-06-29 19:35:49.240] [info] int8 support: false [2020-06-29 19:35:49.240] [info] Max batchsize: 1 [2020-06-29 19:35:49.240] [info] Max workspace size: 10485760 [2020-06-29 19:35:49.241] [info] Number of DLA core: 0 [2020-06-29 19:35:49.241] [info] Max DLA batchsize: 268435456 [2020-06-29 19:35:49.241] [info] Current use DLA core: 0 [2020-06-29 19:35:49.241] [info] build engine... Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. Detected 1 inputs and 3 output network tensors. [2020-06-29 19:37:00.660] [info] serialize engine to ./pose_iter_584000_480x640.trt [2020-06-29 19:37:00.660] [info] save engine to ./pose_iter_584000_480x640.trt... [2020-06-29 19:37:27.716] [info] create execute context and malloc device memory... [2020-06-29 19:37:27.716] [info] init engine... [2020-06-29 19:37:29.367] [info] malloc device memory nbBingdings: 2 [2020-06-29 19:37:29.407] [info] input: [2020-06-29 19:37:29.408] [info] binding bindIndex: 0, name: image, size in byte: 3686400 [2020-06-29 19:37:29.409] [info] binding dims with 3 dimemsion 3 x 480 x 640 [2020-06-29 19:37:31.116] [info] output: [2020-06-29 19:37:31.116] [info] binding bindIndex: 1, name: net_output, size in byte: 1497600 [2020-06-29 19:37:31.116] [info] binding dims with 3 dimemsion 78 x 60 x 80 =====>malloc extra memory for openpose... heatmap Dims3 heatmap size: 1 78 60 80 allocate heatmap host and divice memory done resize map size: 1 78 240 320 kernel size: 1 78 240 320 allocate kernel host and device memory done peaks size: 1 25 128 3 allocate peaks host and device memory done =====> malloc extra memory done CUDA error 209 at /home/nano/work/tensorrt-zoo/tiny-tensorrt/plugin/PReLUPlugin/PReLUPlugin.cu:183

To Reproduce Steps to reproduce the behavior:

  1. download the caffemodel and prototxt
  2. modify the w and h in prototxt to 480 and 640
  3. compile and run as follows ./bin/testopenpose --prototxt /home/nano/work/openpose_tensorrt/models/pose/body_25/pose_deploy.prototxt --caffemodel /home/nano/work/openpose_tensorrt/models/pose/body_25/pose_iter_584000.caffemodel --save_engine ./pose_iter_584000_480x640.trt --input ./test.jpg --run_mode 0

Expected behavior a normal result picture

Screenshots image

System environment (please complete the following information):

zerollzeng commented 4 years ago

Thanks for using tiny-tensorrt, I seach for cuda error 209, and got it from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html, it means

cudaErrorNoKernelImageForDevice = 209
This indicates that there is no kernel image available that is suitable for the device. This can occur when a user specifies code generation options for a particular CUDA source file that do not include the corresponding device configuration.

Which means tiny-tensorrt's default compile configuration was not suitable for jetson nano(actually it's for desktop graphic card), after search, I found that the cuda arch or jetson nano is sm-53

so here might be the way to solve your problem

  1. delete

https://github.com/zerollzeng/tiny-tensorrt/blob/f725c254acb8cdb147accf9132cdbba5148c55b4/CMakeLists.txt#L15-L24

and replace it by

include_directories(${CUDA_INCLUDE_DIRS})
set(CUDA_targeted_archs "53")
  1. them prelu's fp16 may not work, so you might need to comment

https://github.com/zerollzeng/tiny-tensorrt/blob/f725c254acb8cdb147accf9132cdbba5148c55b4/plugin/PReLUPlugin/PReLUPlugin.cu#L185-L192

then see if it solve your problem, btw can you paste the whole log of cmake and make, I never test it on jetson nano, so the log maybe helpful to other people, thx~

yinguoxiangyi commented 4 years ago

Thx for reply I will try soon

yinguoxiangyi commented 4 years ago

It works ! Here is the result picture as follows: image

My Question

[2020-06-30 11:39:49.780] [info] net forward takes 1426.29 ms inference Time : 71.78 ms

What is the difference of cost time between "net forward takes" and "inference Time"? Which is the single frame inference time? To change the input net resolution , what else need to be modified besides the w and h in prototxt ?

Output Info

[2020-06-30 11:39:39.480] [info] create plugin factory [2020-06-30 11:39:39.481] [info] yolo3 params: class: 1, netSize: 416 [2020-06-30 11:39:39.481] [info] upsample params: scale: 2 [2020-06-30 11:39:39.481] [info] prototxt: /home/nano/work/openpose_tensorrt/models/pose/body_25/pose_deploy.prototxt [2020-06-30 11:39:39.481] [info] caffeModel: /home/nano/work/openpose_tensorrt/models/pose/body_25/pose_iter_584000.caffemodel [2020-06-30 11:39:39.481] [info] engineFile: ./pose_iter_584000_480x640.trt [2020-06-30 11:39:39.481] [info] outputBlobName: net_output [2020-06-30 11:39:39.481] [info] deserialize engine from ./pose_iter_584000_480x640.trt [2020-06-30 11:39:47.785] [info] max batch size of deserialized engine: 1 [2020-06-30 11:39:47.814] [info] create execute context and malloc device memory... [2020-06-30 11:39:47.814] [info] init engine... [2020-06-30 11:39:48.184] [info] malloc device memory nbBingdings: 2 [2020-06-30 11:39:48.184] [info] input: [2020-06-30 11:39:48.184] [info] binding bindIndex: 0, name: image, size in byte: 3686400 [2020-06-30 11:39:48.184] [info] binding dims with 3 dimemsion 3 x 480 x 640 [2020-06-30 11:39:48.300] [info] output: [2020-06-30 11:39:48.300] [info] binding bindIndex: 1, name: net_output, size in byte: 1497600 [2020-06-30 11:39:48.300] [info] binding dims with 3 dimemsion 78 x 60 x 80 =====>malloc extra memory for openpose... heatmap Dims3 heatmap size: 1 78 60 80 allocate heatmap host and divice memory done resize map size: 1 78 240 320 kernel size: 1 78 240 320 allocate kernel host and device memory done peaks size: 1 25 128 3 allocate peaks host and device memory done =====> malloc extra memory done [2020-06-30 11:39:49.780] [info] net forward takes 1426.29 ms inference Time : 71.78 ms

CMake Error Log

Determining if the pthread_create exist failed with the following output: Change Dir: /home/nano/work/tensorrt-zoo/build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/make" "cmTC_09b6b/fast" /usr/bin/make -f CMakeFiles/cmTC_09b6b.dir/build.make CMakeFiles/cmTC_09b6b.dir/build make[1]: Entering directory '/home/nano/work/tensorrt-zoo/build/CMakeFiles/CMakeTmp' Building C object CMakeFiles/cmTC_09b6b.dir/CheckSymbolExists.c.o /usr/bin/cc -fPIC -o CMakeFiles/cmTC_09b6b.dir/CheckSymbolExists.c.o -c /home/nano/work/tensorrt-zoo/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c Linking C executable cmTC_09b6b /usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_09b6b.dir/link.txt --verbose=1 /usr/bin/cc -fPIC -rdynamic CMakeFiles/cmTC_09b6b.dir/CheckSymbolExists.c.o -o cmTC_09b6b CMakeFiles/cmTC_09b6b.dir/CheckSymbolExists.c.o: In function main': CheckSymbolExists.c:(.text+0x14): undefined reference topthread_create' CheckSymbolExists.c:(.text+0x18): undefined reference to `pthread_create' collect2: error: ld returned 1 exit status CMakeFiles/cmTC_09b6b.dir/build.make:97: recipe for target 'cmTC_09b6b' failed make[1]: [cmTC_09b6b] Error 1 make[1]: Leaving directory '/home/nano/work/tensorrt-zoo/build/CMakeFiles/CMakeTmp' Makefile:126: recipe for target 'cmTC_09b6b/fast' failed make: [cmTC_09b6b/fast] Error 2

File /home/nano/work/tensorrt-zoo/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c: / /

include

int main(int argc, char** argv) { (void)argv;

ifndef pthread_create

return ((int*)(&pthread_create))[argc];

else

(void)argc; return 0;

endif

}

Determining if the function pthread_create exists in the pthreads failed with the following output: Change Dir: /home/nano/work/tensorrt-zoo/build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/make" "cmTC_4ea6d/fast" /usr/bin/make -f CMakeFiles/cmTC_4ea6d.dir/build.make CMakeFiles/cmTC_4ea6d.dir/build make[1]: Entering directory '/home/nano/work/tensorrt-zoo/build/CMakeFiles/CMakeTmp' Building C object CMakeFiles/cmTC_4ea6d.dir/CheckFunctionExists.c.o /usr/bin/cc -fPIC -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTC_4ea6d.dir/CheckFunctionExists.c.o -c /usr/share/cmake-3.10/Modules/CheckFunctionExists.c Linking C executable cmTC_4ea6d /usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_4ea6d.dir/link.txt --verbose=1 /usr/bin/cc -fPIC -DCHECK_FUNCTION_EXISTS=pthread_create -rdynamic CMakeFiles/cmTC_4ea6d.dir/CheckFunctionExists.c.o -o cmTC_4ea6d -lpthreads /usr/bin/ld: cannot find -lpthreads collect2: error: ld returned 1 exit status CMakeFiles/cmTC_4ea6d.dir/build.make:97: recipe for target 'cmTC_4ea6d' failed make[1]: [cmTC_4ea6d] Error 1 make[1]: Leaving directory '/home/nano/work/tensorrt-zoo/build/CMakeFiles/CMakeTmp' Makefile:126: recipe for target 'cmTC_4ea6d/fast' failed make: [cmTC_4ea6d/fast] Error 2

log during cmake

-- The C compiler identification is GNU 7.4.0 -- The CXX compiler identification is GNU 7.4.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Looking for pthread.h -- Looking for pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE -- Found CUDA: /usr/local/cuda (found version "10.2") -- Found OpenCV: /usr (found version "4.1.1") -- Generated gencode flags: -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -- Found TensorRT headers at /usr/include/aarch64-linux-gnu -- Build python -- Found PythonInterp: /usr/bin/python3.6 (found version "3.6.9") -- Found PythonLibs: /usr/lib/aarch64-linux-gnu/libpython3.6m.so -- Performing Test HAS_CPP14_FLAG -- Performing Test HAS_CPP14_FLAG - Success -- pybind11 v2.3.dev1 -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- LTO enabled -- Configuring done -- Generating done -- Build files have been written to: /home/nano/work/tensorrt-zoo/build

log during make

[ 14%] Building NVCC (Device) object tiny-tensorrt/CMakeFiles/tinytrt.dir/plugin/UpSamplePlugin/tinytrt_generated_UpSamplePlugin.cu.o [ 14%] Building NVCC (Device) object tiny-tensorrt/CMakeFiles/tinytrt.dir/plugin/PReLUPlugin/tinytrt_generated_PReLUPlugin.cu.o [ 14%] Building NVCC (Device) object tiny-tensorrt/CMakeFiles/tinytrt.dir/plugin/YoloLayerPlugin/tinytrt_generated_YoloLayerPlugin.cu.o Scanning dependencies of target tinytrt [ 23%] Building CXX object tiny-tensorrt/CMakeFiles/tinytrt.dir/Trt.cpp.o [ 23%] Building CXX object tiny-tensorrt/CMakeFiles/tinytrt.dir/Int8EntropyCalibrator.cpp.o [ 28%] Building CXX object tiny-tensorrt/CMakeFiles/tinytrt.dir/plugin/PluginFactory.cpp.o [ 33%] Building CXX object tiny-tensorrt/CMakeFiles/tinytrt.dir/plugin/plugin_utils.cpp.o In file included from /home/nano/work/tensorrt-zoo/tiny-tensorrt/plugin/plugin_utils.cpp:8:0: /home/nano/work/tensorrt-zoo/tiny-tensorrt/plugin/plugin_utils.h:28:20: warning: ‘G_PLUGIN_VERSION’ defined but not used [-Wunused-variable] static const char G_PLUGIN_VERSION = "1"; ^~~~ /home/nano/work/tensorrt-zoo/tiny-tensorrt/plugin/plugin_utils.h:27:20: warning: ‘G_PLUGIN_NAMESPACE’ defined but not used [-Wunused-variable] static const char G_PLUGIN_NAMESPACE = "_TRT"; ^~~~~~ In file included from /home/nano/work/tensorrt-zoo/tiny-tensorrt/plugin/YoloLayerPlugin/YoloLayerPlugin.hpp:19:0, from /home/nano/work/tensorrt-zoo/tiny-tensorrt/plugin/PluginFactory.cpp:12: /home/nano/work/tensorrt-zoo/tiny-tensorrt/./plugin/plugin_utils.h:28:20: warning: ‘G_PLUGIN_VERSION’ defined but not used [-Wunused-variable] static const char G_PLUGIN_VERSION = "1"; ^~~~ /home/nano/work/tensorrt-zoo/tiny-tensorrt/./plugin/plugin_utils.h:27:20: warning: ‘G_PLUGIN_NAMESPACE’ defined but not used [-Wunused-variable] static const char G_PLUGIN_NAMESPACE = "_TRT"; ^~~~~~ [ 38%] Linking CXX shared library ../../lib/libtinytrt.so [ 38%] Built target tinytrt [ 42%] Building NVCC (Device) object openpose/CMakeFiles/testopenpose.dir/testopenpose_generated_OpenPose.cu.o [ 47%] Building NVCC (Device) object openpose/CMakeFiles/testopenpose.dir/testopenpose_generated_ResizeAndMerge.cu.o Scanning dependencies of target testyolov3 Scanning dependencies of target pytrt [ 52%] Building NVCC (Device) object openpose/CMakeFiles/testopenpose.dir/testopenpose_generated_PoseNMS.cu.o [ 57%] Building NVCC (Device) object openpose/CMakeFiles/testopenpose.dir/testopenpose_generated_BodyPartConnector.cu.o [ 61%] Building CXX object tiny-tensorrt/CMakeFiles/pytrt.dir/PyTrt.cpp.o [ 66%] Building CXX object yolov3/CMakeFiles/testyolov3.dir/Yolo_Batch_And_608_Test.cpp.o [ 71%] Building CXX object yolov3/CMakeFiles/testyolov3.dir/YoloV3.cpp.o /home/nano/work/tensorrt-zoo/yolov3/Yolo_Batch_And_608_Test.cpp:16:0: warning: ignoring #pragma comment [-Wunknown-pragmas]

pragma comment(lib,"libtinytrt.lib")

/home/nano/work/tensorrt-zoo/yolov3/Yolo_Batch_And_608_Test.cpp:17:0: warning: ignoring #pragma comment [-Wunknown-pragmas]

pragma comment(lib,"opencv_world410.lib")

/home/nano/work/tensorrt-zoo/yolov3/Yolo_Batch_And_608_Test.cpp:19:0: warning: ignoring #pragma comment [-Wunknown-pragmas]

pragma comment(lib,"cuda.lib")

/home/nano/work/tensorrt-zoo/yolov3/Yolo_Batch_And_608_Test.cpp:20:0: warning: ignoring #pragma comment [-Wunknown-pragmas]

pragma comment(lib,"cudart.lib")

/home/nano/work/tensorrt-zoo/yolov3/YoloV3.cpp: In member function ‘void YoloV3::DoInference(YoloInDataSt*, int, std::vector<std::vector >&)’: /home/nano/work/tensorrt-zoo/yolov3/YoloV3.cpp:131:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (size_t i = 0; i < batchsize; i++) ^~~~~ /home/nano/work/tensorrt-zoo/yolov3/Yolo_Batch_And_608_Test.cpp: In function ‘int main()’: /home/nano/work/tensorrt-zoo/yolov3/Yolo_Batch_And_608_Test.cpp:35:32: warning: conversion to ‘gnu_cxx::__alloc_traits<std::allocator >::value_type {aka float}’ alters ‘double’ constant value [-Wfloat-conversion] yolo_calibratorData[i][j] = 0.05; ^~~~ [ 76%] Linking CXX executable ../../bin/testyolov3 [ 76%] Built target testyolov3 Scanning dependencies of target testopenpose [ 80%] Building CXX object openpose/CMakeFiles/testopenpose.dir/Array.cpp.o [ 85%] Building CXX object openpose/CMakeFiles/testopenpose.dir/Point.cpp.o [ 90%] Building CXX object openpose/CMakeFiles/testopenpose.dir/testopenpose.cpp.o /home/nano/work/tensorrt-zoo/openpose/testopenpose.cpp: In function ‘int main(int, char**)’: /home/nano/work/tensorrt-zoo/openpose/testopenpose.cpp:91:36: warning: conversion to ‘gnu_cxx::alloc_traits<std::allocator >::value_type {aka float}’ alters ‘double’ constant value [-Wfloat-conversion] calibratorData[i][j] = 0.05; ^~~~ /home/nano/work/tensorrt-zoo/openpose/testopenpose.cpp:115:59: warning: conversion to ‘int’ from ‘gnu_cxx::alloc_traits<std::allocator >::value_type {aka float}’ may alter its value [-Wfloat-conversion] cv::circle(img,cv::Point(result[i3],result[i3+1]),2,cv::Scalar(0,255,0),-1); ^ /home/nano/work/tensorrt-zoo/openpose/testopenpose.cpp:115:59: warning: conversion to ‘int’ from ‘gnu_cxx::__alloc_traits<std::allocator >::value_type {aka float}’ may alter its value [-Wfloat-conversion] [ 95%] Linking CXX shared library ../../lib/pytrt.cpython-36m-aarch64-linux-gnu.so [100%] Linking CXX executable ../../bin/testopenpose [100%] Built target testopenpose [100%] Built target pytrt

zerollzeng commented 4 years ago

What is the difference of cost time between "net forward takes" and "inference Time"? Which is the single frame inference time? To change the input net resolution , what else need to be modified besides the w and h in prototxt ?

  1. just search the source code see where give you those output
  2. the best way is log time by your self
  3. you are using tensorrt-zoo right? I don't remember too since it's been a long time, just modify prototxt first and see if it can run correctly, if not, dig into source code :smile: :smile: :smile:
yinguoxiangyi commented 4 years ago

OK, I'll read the source code. Thx for reply again !