Open engineer1109 opened 5 years ago
Hi, what is your tensorRT version ? I tested in tensorRT 4.0.1.6 and 5.0.2.6
Try version above 4.x
@lewes6369 I am using tensorrt5. Do you manually configure the path of tensorrt (cmake ..)? Or auto.
hi, @engineer1109 In the proj it is auto. It will output the path and the version when you do "cmake ..". Also, you can manually configure the paths in /tensorRTWrapper/code/CMakeLists.txt named {TENSORRT_INCLUDE_DIR} and {TENSORRT_LIBRARY}
@lewes6369 I have solved it. TENSORRT_LIBRARY_INFER should be TensorRT-5.0.2.6/lib/libnvinfer.so.5.0.2 not DIR
Emmmm, I dont see any faster in int8 and fp16. My GPU is Titan Xp
And what code do you use to transfrom from darknet to caffe
@lewes6369 I have solved it. TENSORRT_LIBRARY_INFER should be TensorRT-5.0.2.6/lib/libnvinfer.so.5.0.2 not DIR
@engineer1109 yes, TENSORRT_LIBRARY is not dir , is the file. Maybe your dir have no file named libnvinfer.so directly, so you can create a soft link to libnvinfer.so.5.0.2.
Emmmm, I dont see any faster in int8 and fp16. My GPU is Titan Xp
Can you show me the time cost in Titan Xp? You can run the inference for a lot of times to get the average cost. As I know, the Titan Xp is not supported the fast fp16, so the fp16 will have no improvement. Runing fp16 will output the notice in the log.
And what code do you use to transfrom from darknet to caffe
see here
@lewes6369 I use my own model to convert. But the prediction is error like random scores. My own model is 81 classes and 258 connection layers. what need I to change?
FP32: Time taken for inference is 16.0298 ms. layer1-conv 0.184ms layer1-act 0.219ms layer2-conv 0.332ms layer2-act 0.111ms layer3-conv 0.102ms layer3-act 0.057ms layer4-conv 0.308ms layer4-act 0.111ms layer5-shortcut 0.167ms layer6-conv 0.337ms layer6-act 0.056ms layer7-conv 0.065ms layer7-act 0.030ms layer8-conv 0.208ms layer8-act 0.057ms layer9-shortcut 0.085ms layer10-conv 0.061ms layer10-act 0.037ms layer11-conv 0.206ms layer11-act 0.057ms layer12-shortcut 0.084ms layer13-conv 0.339ms layer13-act 0.030ms layer14-conv 0.055ms layer14-act 0.014ms layer15-conv 0.187ms layer15-act 0.029ms layer16-shortcut 0.045ms layer17-conv 0.053ms layer17-act 0.015ms layer18-conv 0.195ms layer18-act 0.029ms layer19-shortcut 0.044ms layer20-conv 0.054ms layer20-act 0.014ms layer21-conv 0.188ms layer21-act 0.029ms layer22-shortcut 0.049ms layer23-conv 0.053ms layer23-act 0.014ms layer24-conv 0.399ms layer24-act 0.030ms layer25-shortcut 0.044ms layer26-conv 0.055ms layer26-act 0.015ms layer27-conv 0.191ms layer27-act 0.029ms layer28-shortcut 0.044ms layer29-conv 0.053ms layer29-act 0.014ms layer30-conv 0.181ms layer30-act 0.029ms layer31-shortcut 0.044ms layer32-conv 0.055ms layer32-act 0.014ms layer33-conv 0.184ms layer33-act 0.028ms layer34-shortcut 0.045ms layer35-conv 0.053ms layer35-act 0.014ms layer36-conv 0.183ms layer36-act 0.037ms layer37-shortcut 0.044ms layer38-conv 0.360ms layer38-act 0.014ms layer39-conv 0.058ms layer39-act 0.005ms layer40-conv 0.213ms layer40-act 0.015ms layer41-shortcut 0.024ms layer42-conv 0.056ms layer42-act 0.006ms layer43-conv 0.205ms layer43-act 0.014ms layer44-shortcut 0.025ms layer45-conv 0.061ms layer45-act 0.005ms layer46-conv 0.200ms layer46-act 0.017ms layer47-shortcut 0.024ms layer48-conv 0.055ms layer48-act 0.006ms layer49-conv 0.206ms layer49-act 0.015ms layer50-shortcut 0.024ms layer51-conv 0.055ms layer51-act 0.005ms layer52-conv 0.203ms layer52-act 0.015ms layer53-shortcut 0.024ms layer54-conv 0.058ms layer54-act 0.005ms layer55-conv 0.200ms layer55-act 0.015ms layer56-shortcut 0.025ms layer57-conv 0.054ms layer57-act 0.005ms layer58-conv 0.201ms layer58-act 0.015ms layer59-shortcut 0.024ms layer60-conv 0.054ms layer60-act 0.006ms layer61-conv 0.203ms layer61-act 0.015ms layer62-shortcut 0.025ms layer63-conv 0.459ms layer63-act 0.006ms layer64-conv 0.074ms layer64-act 0.005ms layer65-conv 0.417ms layer65-act 0.007ms layer66-shortcut 0.011ms layer67-conv 0.076ms layer67-act 0.004ms layer68-conv 0.362ms layer68-act 0.006ms layer69-shortcut 0.010ms layer70-conv 0.076ms layer70-act 0.004ms layer71-conv 0.372ms layer71-act 0.007ms layer72-shortcut 0.010ms layer73-conv 0.073ms layer73-act 0.004ms layer74-conv 0.427ms layer74-act 0.007ms layer75-shortcut 0.010ms layer76-conv 0.073ms layer76-act 0.005ms layer77-conv 0.428ms layer77-act 0.008ms layer78-conv 0.072ms layer78-act 0.004ms layer79-conv 0.353ms layer79-act 0.007ms layer80-conv 0.071ms layer80-act 0.004ms layer81-conv 0.356ms layer81-act 0.007ms layer82-conv 0.049ms layer80-conv copy 0.006ms layer85-conv 0.027ms layer85-act 0.004ms layer86-upsample 0.014ms layer86-upsample copy 0.006ms layer88-conv 0.080ms layer88-act 0.006ms layer89-conv 0.203ms layer89-act 0.270ms layer90-conv 0.060ms layer90-act 0.005ms layer91-conv 0.203ms layer91-act 0.014ms layer92-conv 0.061ms layer92-act 0.005ms layer93-conv 0.202ms layer93-act 0.015ms layer94-conv 0.057ms layer92-conv copy 0.017ms layer97-conv 0.036ms layer97-act 0.004ms layer98-upsample 0.024ms layer98-upsample copy 0.017ms layer100-conv 0.072ms layer100-act 0.013ms layer101-conv 0.190ms layer101-act 0.029ms layer102-conv 0.054ms layer102-act 0.014ms layer103-conv 0.184ms layer103-act 0.029ms layer104-conv 0.053ms layer104-act 0.014ms layer105-conv 0.200ms layer105-act 0.028ms layer106-conv 0.092ms yolo-det 0.439ms Time over all layers: 15.533 detCount: 12 Time taken for nms is 0.002724 ms.
INT8: Time taken for inference is 16.4116 ms. layer1-conv 0.179ms layer1-act 0.220ms layer2-conv 0.332ms layer2-act 0.111ms layer3-conv 0.104ms layer3-act 0.057ms layer4-conv 0.297ms layer4-act 0.113ms layer5-shortcut 0.166ms layer6-conv 0.333ms layer6-act 0.056ms layer7-conv 0.056ms layer7-act 0.029ms layer8-conv 0.407ms layer8-act 0.057ms layer9-shortcut 0.085ms layer10-conv 0.063ms layer10-act 0.028ms layer11-conv 0.215ms layer11-act 0.056ms layer12-shortcut 0.085ms layer13-conv 0.339ms layer13-act 0.029ms layer14-conv 0.054ms layer14-act 0.015ms layer15-conv 0.189ms layer15-act 0.029ms layer16-shortcut 0.044ms layer17-conv 0.054ms layer17-act 0.013ms layer18-conv 0.184ms layer18-act 0.029ms layer19-shortcut 0.044ms layer20-conv 0.054ms layer20-act 0.015ms layer21-conv 0.183ms layer21-act 0.029ms layer22-shortcut 0.044ms layer23-conv 0.050ms layer23-act 0.014ms layer24-conv 0.386ms layer24-act 0.030ms layer25-shortcut 0.044ms layer26-conv 0.054ms layer26-act 0.014ms layer27-conv 0.185ms layer27-act 0.030ms layer28-shortcut 0.044ms layer29-conv 0.054ms layer29-act 0.013ms layer30-conv 0.183ms layer30-act 0.029ms layer31-shortcut 0.045ms layer32-conv 0.053ms layer32-act 0.014ms layer33-conv 0.184ms layer33-act 0.029ms layer34-shortcut 0.045ms layer35-conv 0.053ms layer35-act 0.014ms layer36-conv 0.185ms layer36-act 0.029ms layer37-shortcut 0.045ms layer38-conv 0.361ms layer38-act 0.014ms layer39-conv 0.066ms layer39-act 0.005ms layer40-conv 0.400ms layer40-act 0.015ms layer41-shortcut 0.025ms layer42-conv 0.058ms layer42-act 0.005ms layer43-conv 0.205ms layer43-act 0.014ms layer44-shortcut 0.024ms layer45-conv 0.055ms layer45-act 0.005ms layer46-conv 0.217ms layer46-act 0.015ms layer47-shortcut 0.029ms layer48-conv 0.056ms layer48-act 0.005ms layer49-conv 0.205ms layer49-act 0.014ms layer50-shortcut 0.025ms layer51-conv 0.060ms layer51-act 0.005ms layer52-conv 0.209ms layer52-act 0.015ms layer53-shortcut 0.024ms layer54-conv 0.058ms layer54-act 0.006ms layer55-conv 0.206ms layer55-act 0.015ms layer56-shortcut 0.024ms layer57-conv 0.055ms layer57-act 0.005ms layer58-conv 0.203ms layer58-act 0.015ms layer59-shortcut 0.024ms layer60-conv 0.065ms layer60-act 0.005ms layer61-conv 0.417ms layer61-act 0.015ms layer62-shortcut 0.025ms layer63-conv 0.460ms layer63-act 0.006ms layer64-conv 0.075ms layer64-act 0.004ms layer65-conv 0.347ms layer65-act 0.007ms layer66-shortcut 0.011ms layer67-conv 0.073ms layer67-act 0.004ms layer68-conv 0.371ms layer68-act 0.007ms layer69-shortcut 0.010ms layer70-conv 0.074ms layer70-act 0.004ms layer71-conv 0.345ms layer71-act 0.008ms layer72-shortcut 0.010ms layer73-conv 0.193ms layer73-act 0.005ms layer74-conv 0.349ms layer74-act 0.007ms layer75-shortcut 0.011ms layer76-conv 0.076ms layer76-act 0.004ms layer77-conv 0.341ms layer77-act 0.007ms layer78-conv 0.072ms layer78-act 0.004ms layer79-conv 0.385ms layer79-act 0.007ms layer80-conv 0.072ms layer80-act 0.004ms layer81-conv 0.366ms layer81-act 0.017ms layer82-conv 0.050ms layer80-conv copy 0.014ms layer85-conv 0.027ms layer85-act 0.003ms layer86-upsample 0.014ms layer86-upsample copy 0.006ms layer88-conv 0.081ms layer88-act 0.006ms layer89-conv 0.213ms layer89-act 0.016ms layer90-conv 0.213ms layer90-act 0.006ms layer91-conv 0.208ms layer91-act 0.019ms layer92-conv 0.055ms layer92-act 0.006ms layer93-conv 0.198ms layer93-act 0.015ms layer94-conv 0.053ms layer92-conv copy 0.018ms layer97-conv 0.025ms layer97-act 0.004ms layer98-upsample 0.024ms layer98-upsample copy 0.017ms layer100-conv 0.072ms layer100-act 0.014ms layer101-conv 0.201ms layer101-act 0.029ms layer102-conv 0.048ms layer102-act 0.014ms layer103-conv 0.189ms layer103-act 0.029ms layer104-conv 0.048ms layer104-act 0.013ms layer105-conv 0.181ms layer105-act 0.030ms layer106-conv 0.091ms yolo-det 0.363ms Time over all layers: 15.849 detCount: 12 Time taken for nms is 0.002727 ms.
For your own yolo model , modify the config in "tensorRTWrapper/code/include/YoloConfigs.h" In your int8 mode , layers cost more. It is not normal. Can you show me the output info about fp32 and int8 before running inference?
####### input args####### C=3; H=608; W=608; caffemodel=./yolov3a.caffemodel; calib=; class=81; input=./dog.jpg; mode=fp32; nms=0.450000; outputs=yolo-det; prototxt=./yolov3a.prototxt; video=1; videoinput=1.avi; ####### end args####### init plugin proto: ./yolov3a.prototxt caffemodel: ./yolov3a.caffemodel Begin parsing model... End parsing model... Begin building engine... End building engine... save Engine... save Engine ok (you can manual load next time) Time taken for inference is 15.4174 ms. Time taken for nms is 0.003583 ms.
I wrote another code for cudnn both infer and train, I found Cudnn fp16 mode will become slower. Cudnn Int8 is a little faster than fp32. Mode | fps Cudnn fp32 36-38 Tensorrt fp32 38-40 Cudnn fp16 30 Tensorrt fp16 32-35 Cudnn in8 38-40 Tensorrt int 34-36 Titan Xp is different from 1080ti ? They are both Pascal. And Titan should be the leadership.
Hi, @engineer1109 For the cudnn, do you infer model by Caffe? As written here, Titan Xp did not support fp16, It must output the log "Notice: the platform do not has fast for fp16" In TrtNet.cpp.
As I know, Titan Xp TOPS is much larger than TFLOPS. It should be faster in int8 as 1080ti. So it is strange int8 slower in tensorRT by Titan Xp. 1080ti vs Titan Xp , they are the similar TFLOPs. I am not sure which is the leadership.
@lewes6369 I am using tensorrt5. Do you manually configure the path of tensorrt (cmake ..)? Or auto. No I didn't configure the path, but it work correct
/home/ispr/software/TensorRT-Yolov3/eval.cpp:154:91: note: no known conversion for argument 1 from ‘const std::pair<Tn::Bbox, bool>’ to ‘CheckPair& {aka std::pair<Tn::Bbox, bool>&}’ CMakeFiles/runYolov3.dir/build.make:86: recipe for target 'CMakeFiles/runYolov3.dir/eval.cpp.o' failed
Can you tell me how to modify the following lines of code?Tks!!!
sort(checkPRBoxs.begin(),checkPRBoxs.end(),[](CheckPair& left,CheckPair& right){ return left.first.score > right.first.score; } );
@scutzhe yes, it checks the path auto in tensorRTWrapper/code/CMakeLists.txt
.
@lensea modify the [](CheckPair& left,CheckPair& right)
to [](const CheckPair& left,const CheckPair& right)
or [](auto& left,auto& right)
if c++14 supported
Scanning dependencies of target TrtNet [ 12%] Building CXX object tensorRTWrapper/code/CMakeFiles/TrtNet.dir/src/TrtNet.cpp.o [ 25%] Linking CXX static library libTrtNet.a [ 75%] Built target TrtNet Scanning dependencies of target runYolov3 [ 87%] Linking CXX executable runYolov3 CMakeFiles/runYolov3.dir/main.cpp.o: In function, std::allocator >, std::allocator<std:: cxx11::basic_string<char, std::char_traits, std::allocator > > > const&)':
TrtNet.cpp:(.text+0x137): undefined reference to , std::allocator > const&)':
TrtNet.cpp:(.text+0x16e9): undefined reference to `createInferRuntime_INTERNAL'
collect2: error: ld returned 1 exit status
CMakeFiles/runYolov3.dir/build.make:127: recipe for target 'runYolov3' failed
make[2]: [runYolov3] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/runYolov3.dir/all' failed
make[1]: [CMakeFiles/runYolov3.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
Tn::PluginFactory::createPlugin(char const*, nvinfer1::Weights const*, int)': main.cpp:(.text._ZN2Tn13PluginFactory12createPluginEPKcPKN8nvinfer17WeightsEi[_ZN2Tn13PluginFactory12createPluginEPKcPKN8nvinfer17WeightsEi]+0x35): undefined reference to
nvinfer1::plugin::createPReLUPlugin(float)' CMakeFiles/runYolov3.dir/main.cpp.o: In functionTn::PluginFactory::createPlugin(char const*, void const*, unsigned long)': main.cpp:(.text._ZN2Tn13PluginFactory12createPluginEPKcPKvm[_ZN2Tn13PluginFactory12createPluginEPKcPKvm]+0x39): undefined reference to
nvinfer1::plugin::createPReLUPlugin(void const, unsigned long)' tensorRTWrapper/code/libTrtNet.a(TrtNet.cpp.o): In function `Tn::trtNet::loadModelAndCreateEngine(char const, char const, int, nvcaffeparser1::ICaffeParser, nvcaffeparser1::IPluginFactory, nvinfer1::IInt8Calibrator, nvinfer1::IHostMemory*&, std::vector<std::cxx11::basic_string<char, std::char_traitscreateInferBuilder_INTERNAL' TrtNet.cpp:(.text+0x2cb): undefined reference to
nvcaffeparser1::shutdownProtobufLibrary()' tensorRTWrapper/code/libTrtNet.a(TrtNet.cpp.o): In functionTn::trtNet::trtNet(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::vector<float, std::allocator<float> >, std::allocator<std::vector<float, std::allocator<float> > > > const&, Tn::RUN_MODE)': TrtNet.cpp:(.text+0xf06): undefined reference to
nvcaffeparser1::createCaffeParser()' TrtNet.cpp:(.text+0x1190): undefined reference tocreateInferRuntime_INTERNAL' tensorRTWrapper/code/libTrtNet.a(TrtNet.cpp.o): In function
Tn::trtNet::trtNet(std::__cxx11::basic_string<char, std::char_traits