uname0x96 commented 5 years ago

Hi @lewes6369 , I'm very happy because your git help me so much on my work. But can you give me what link you was use for trainning. I'm a noob in caffe so i was try convert from keras or tensorflow to caffe but it so hard and i meet so many bug with that. Can you help me, please.

lewes6369 commented 5 years ago

Hi, @cong235 ,I am happy for the help to your work. You can train the darknet model use the official yolov3 git :https://github.com/pjreddie/darknet. Next convert them to caffemodel by git and git. And also you can try this https://github.com/eric612/MobileNet-YOLO ,training the yolo model directly by caffe framework

uname0x96 commented 5 years ago

Hi, @cong235 ,I am happy for the help to your work. You can train the darknet model use the official yolov3 git :https://github.com/pjreddie/darknet. Next convert them to caffemodel by git and git. And also you can try this https://github.com/eric612/MobileNet-YOLO ,training the yolo model directly by caffe framework

I got it. Thanks for your work :D

uname0x96 commented 5 years ago

Hi, @lewes6369 , I can resolve my problem but i have 1 question because i dont know about that. When i use pretrain yolov3 from darknet. I just train 1 class is my cane. So i config it to train with 1 class, and set filters=(classes + 5)3 = (1+5) 3. Then when i converted it to caffe model. i use this follow cmd for detect my cane : ./install/runYolov3 --caffemodel=./model/yolov3_cane.caffemodel --prototxt=./model/yolov3_cane_trt.prototxt --W=416 --H=416 --class=1 --mode=fp16 --input=./doge.jpg It will be got core cumped. But if i change --class=80 it will be ok, and class 0 change from human like your yolo3_model to cane with my model. And also i just train 1 class but it can detect too many other object like coco model. What does it mean ? And can i fix output model just detect only 1 class is my cane, because i think it can increase my performance. Thanks for reading!

lewes6369 commented 5 years ago

Hi,@cong235 . Do you modify the CLASS_NUM in file tensorRTWrapper/code/include/YoloConfigs.h to class one? Not only the cmd but also this header need to change the class num. I will merge these two num setting in the cmd later.

uname0x96 commented 5 years ago

Hi,@cong235 . Do you modify the CLASS_NUM in file tensorRTWrapper/code/include/YoloConfigs.h to class one? Not only the cmd but also this header need to change the class num. I will merge these two num setting in the cmd later.

Thanks for your reply. I got it, i forgot it :D p/s: Hmm, can i write custom layer without coding for cuda ?

lewes6369 commented 5 years ago

Yes. If you did not coding for cuda, you have to run the custom layer on cpu and it will cost time over the communication between cpu memory and gpu device. The last low computing layer doing in cpu is just ok.

uname0x96 commented 5 years ago

Yes. If you did not coding for cuda, you have to run the custom layer on cpu and it will cost time over the communication between cpu memory and gpu device. The last low computing layer doing in cpu is just ok.

It mean after convert caffe to engine TRT, that layer will just run with CPU instead GPU right ?

lewes6369 commented 5 years ago

Convert to engine TRT ,all support layers will run on GPU. The customer layer can be run both cpu and gpu by your implementation. If you want to run on gpu, just code as cuda . See codes in YoloLayer.cu,It contains both cpu and gpu implementation.

uname0x96 commented 5 years ago

Convert to engine TRT ,all support layers will run on GPU. The customer layer can be run both cpu and gpu by your implementation. If you want to run on gpu, just code as cuda . See codes in YoloLayer.cu,It contains both cpu and gpu implementation.

Thanks for your reply. Hmm i was try to use the same config protox with yolo_tiny. But it's not working. So have some difference between yolov3 vs yolo_tini ? I need change the code in tensorRT wrapper right ?

uname0x96 commented 5 years ago

Hi @lewes6369 , i'm working with Jetson TX2 for real-time detection, with batchsize = 1 and yolov3 it took 140ms per image ~ 7FPS. And now i use batchsize = 4 and it took 530ms for 1 batch ~ 7FPS too. Can you give some suggestion for increase it ? my goal is 15 FPS whith batchsize = 4. I'm trying to decrease yolo kernel and CHECK_COUNT and number of anchor box but still cant increase so much. Do you have any suggest ? Thanks for reading :dancer:

aditbhrgv commented 5 years ago

Thanks for your reply. Hmm i was try to use the same config protox with yolo_tiny. But it's not working. So have some difference between yolov3 vs yolo_tini ? I need change the code in tensorRT wrapper right ?

Hi cong235,

I am also trying to run the tiny-yolo-3l.cfg . Have you already managed to change the TensorRT wrapper for tiny yolo? Can you please let me know the changes you did ?

THanks

uname0x96 commented 5 years ago

Thanks for your reply. Hmm i was try to use the same config protox with yolo_tiny. But it's not working. So have some difference between yolov3 vs yolo_tini ? I need change the code in tensorRT wrapper right ?

Hi cong235,

I am also trying to run the tiny-yolo-3l.cfg . Have you already managed to change the TensorRT wrapper for tiny yolo? Can you please let me know the changes you did ?

THanks

yes, i was runing with yolov3_tiny. You should change the tensorrt_wrap and yoloConfig. YoloConfig.h //YOLO 416 YoloKernel yolo1 = { 13, 13, {81,82, 135,169, 344,319} }; YoloKernel yolo2 = { 26, 26, {10,14, 23,27, 37,58} };

And comment line : //mYoloKernel.push_back(yolo3); in YoloLayer.cu now you can run it

aditbhrgv commented 5 years ago

@cong235 - Actually I am running tiny-yolo-3l.cfg model. Accoriding to this model, I changed the Yoloconfig.h file with appropriate anchors and CLASS_NUM = 2 (because I have 2 classes to detect). But, the inference code doesn't give me any detections. Have you faced similar problem ? Do we need to change something else ? Thanks for yor help in advance !

uname0x96 commented 5 years ago

@cong235 - Actually I am running tiny-yolo-3l.cfg model. Accoriding to this model, I changed the Yoloconfig.h file with appropriate anchors and CLASS_NUM = 2 (because I have 2 classes to detect). But, the inference code doesn't give me any detections. Have you faced similar problem ? Do we need to change something else ? Thanks for yor help in advance !

You need change your tiny.cfg too. comment the "upsample_param" blocks, and modify the prototxt the last layer as: layer {

the bottoms are the yolo input layers

bottom: "layer16-conv"
bottom: "layer23-conv"
top: "yolo-det"
name: "yolo-det"
type: "Yolo"

}

aditbhrgv commented 5 years ago

@cong235 I already did that ! Refer my .cfg file attached ! Still I can't get any detections. Maybe some hard-coded confidence thresholds? name: "Darkent2Caffe" input: "data" input_dim: 1 input_dim: 3 input_dim: 608 input_dim: 608

layer { bottom: "data" top: "layer1-conv" name: "layer1-conv" type: "Convolution" convolution_param { num_output: 16 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer1-conv" top: "layer2-maxpool" name: "layer2-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer2-maxpool" top: "layer3-conv" name: "layer3-conv" type: "Convolution" convolution_param { num_output: 32 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer3-conv" top: "layer4-maxpool" name: "layer4-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer4-maxpool" top: "layer5-conv" name: "layer5-conv" type: "Convolution" convolution_param { num_output: 64 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer5-conv" top: "layer6-maxpool" name: "layer6-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer6-maxpool" top: "layer7-conv" name: "layer7-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer7-conv" top: "layer8-maxpool" name: "layer8-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer8-maxpool" top: "layer9-conv" name: "layer9-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer9-conv" top: "layer10-maxpool" name: "layer10-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer10-maxpool" top: "layer11-conv" name: "layer11-conv" type: "Convolution" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer11-conv" top: "layer12-maxpool" name: "layer12-maxpool" type: "Pooling" pooling_param { stride: 1 pool: MAX kernel_size: 3 pad: 1 } } layer { bottom: "layer12-maxpool" top: "layer13-conv" name: "layer13-conv" type: "Convolution" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer13-conv" top: "layer14-conv" name: "layer14-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer14-conv" top: "layer15-conv" name: "layer15-conv" type: "Convolution" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer15-conv" top: "layer16-conv" name: "layer16-conv" type: "Convolution" convolution_param { num_output: 21 kernel_size: 1 pad: 0 stride: 1 bias_term: true } } layer { bottom: "layer14-conv" top: "layer18-route" name: "layer18-route" type: "Concat" } layer { bottom: "layer18-route" top: "layer19-conv" name: "layer19-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer19-conv" top: "layer20-upsample" name: "layer20-upsample" type: "Upsample"

upsample_param {

#    scale: 2
#}

} layer { bottom: "layer20-upsample" bottom: "layer9-conv" top: "layer21-route" name: "layer21-route" type: "Concat" } layer { bottom: "layer21-route" top: "layer22-conv" name: "layer22-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer22-conv" top: "layer23-conv" name: "layer23-conv" type: "Convolution" convolution_param { num_output: 21 kernel_size: 1 pad: 0 stride: 1 bias_term: true } } layer { bottom: "layer22-conv" top: "layer25-route" name: "layer25-route" type: "Concat" } layer { bottom: "layer25-route" top: "layer26-conv" name: "layer26-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer26-conv" top: "layer26-conv" name: "layer26-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer26-conv" top: "layer26-conv" name: "layer26-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer26-conv" top: "layer26-conv" name: "layer26-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer26-conv" top: "layer27-upsample" name: "layer27-upsample" type: "Upsample"

upsample_param {

#    scale: 2
#}

} layer { bottom: "layer27-upsample" bottom: "layer7-conv" top: "layer28-route" name: "layer28-route" type: "Concat" } layer { bottom: "layer28-route" top: "layer29-conv" name: "layer29-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer29-conv" top: "layer29-conv" name: "layer29-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer29-conv" top: "layer29-conv" name: "layer29-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer29-conv" top: "layer29-conv" name: "layer29-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer29-conv" top: "layer30-conv" name: "layer30-conv" type: "Convolution" convolution_param { num_output: 21 kernel_size: 1 pad: 0 stride: 1 bias_term: true } }

layer { bottom: "layer16-conv" bottom: "layer23-conv" bottom: "layer30-conv" top: "yolo-det" name: "yolo-det" type: "Yolo" }

uname0x96 commented 5 years ago

@aditbhrgv You just need do something if want use other yolo model:

Edit file .cfg
Change numclass
Edit YoloKernel with filter size match with anchor box
Check your model again.

aditbhrgv commented 5 years ago

@cong235 thank you ! It works now. However, my int8 mode is not working. I am using TensorRT 5.0.2.6 (is there some dependency on TensorRT version ?) I chose 10 images in calibration dataset which is a subset of my validation dataset.

####### input args####### C=3; H=608; W=608; batchsize=1; caffemodel=yolov3-3l.caffemodel; calib=calib_sample.txt; class=2; enginefile=; evallist=; input=000000.jpg; mode=int8; nms=0.450000; outputs=yolo-det; prototxt=yolov3-3l.prototxt; ####### end args####### find calibration file,loading ... init plugin proto: yolov3-3l.prototxt caffemodel: yolov3-3l.caffemodel create calibrator,Named:yolov3-3l Begin parsing model... End parsing model... setInt8Mode Begin building engine... End building engine... save Engine...yolov3_int8.engine process: 000000.jpg Time taken for inference is 9.37288 ms. Det count is 0 Time taken for nms is 0.001386 ms. layer1-conv input reformatter 0 0.038ms layer1-conv 0.130ms layer1-act 0.134ms layer2-maxpool 0.093ms layer3-conv input reformatter 0 0.037ms layer3-conv 0.071ms layer3-act 0.069ms layer4-maxpool 0.049ms layer5-conv input reformatter 0 0.020ms layer5-conv 0.046ms layer5-act 0.036ms layer6-maxpool 0.030ms layer7-conv input reformatter 0 0.011ms layer7-conv 0.038ms layer7-act 0.017ms layer8-maxpool 0.016ms layer9-conv input reformatter 0 0.006ms layer9-conv 0.046ms layer9-act 0.006ms layer10-maxpool 0.007ms layer11-conv input reformatter 0 0.004ms layer11-conv 0.070ms layer11-act 0.004ms layer12-maxpool 0.010ms layer13-conv input reformatter 0 0.005ms layer13-conv 0.137ms layer13-act input reformatter 0 0.011ms layer13-act 0.005ms layer14-conv input reformatter 0 0.008ms layer14-conv 0.038ms layer14-act 0.004ms layer14-act output reformatter 0 0.005ms layer15-conv 0.053ms layer15-act 0.004ms layer16-conv 0.016ms layer14-conv copy 0.006ms layer19-conv 0.012ms layer19-act 0.003ms layer20-upsample 0.010ms layer20-upsample copy 0.007ms layer9-conv copy 0.010ms layer22-conv 0.099ms layer22-act 0.006ms layer23-conv 0.015ms layer22-conv copy 0.009ms layer26-conv 0.023ms layer26-act 0.005ms layer27-upsample 0.029ms layer27-upsample copy 0.019ms layer7-conv copy 0.018ms layer29-conv 0.104ms layer29-act 0.017ms layer30-conv 0.031ms yolo-det 6.559ms Time over all layers: 8.263

zeyuDai2018 commented 5 years ago

@aditbhrgv Hi: I'm also trying to use TensorRt to do real time object detection tasks with TX2 recently and I believe TX2 does not support int8 mode. It works for xavier and some GeForce cards.

uname0x96 commented 5 years ago

@aditbhrgv not support is not running bro. "Not support" mean you can't run it with full rate at INT8. But in real-life some model can run very fast with INT8. Yolov3-tiny can run with 66FPS on TX2 with INT8

mxzhao commented 5 years ago

Hi, @cong235 ,I am happy for the help to your work. You can train the darknet model use the official yolov3 git :https://github.com/pjreddie/darknet. Next convert them to caffemodel by git and git. And also you can try this https://github.com/eric612/MobileNet-YOLO ,training the yolo model directly by caffe framework

I got it. Thanks for your work :D

Hi, It seems that https://github.com/ChenYingpeng/caffe-yolov3/tree/master/model_convert doesn't exist, is there any other methods?

mxzhao commented 5 years ago

@cong235 thank you ! It works now. However, my int8 mode is not working. I am using TensorRT 5.0.2.6 (is there some dependency on TensorRT version ?) I chose 10 images in calibration dataset which is a subset of my validation dataset.

####### input args####### C=3; H=608; W=608; batchsize=1; caffemodel=yolov3-3l.caffemodel; calib=calib_sample.txt; class=2; enginefile=; evallist=; input=000000.jpg; mode=int8; nms=0.450000; outputs=yolo-det; prototxt=yolov3-3l.prototxt; ####### end args####### find calibration file,loading ... init plugin proto: yolov3-3l.prototxt caffemodel: yolov3-3l.caffemodel create calibrator,Named:yolov3-3l Begin parsing model... End parsing model... setInt8Mode Begin building engine... End building engine... save Engine...yolov3_int8.engine process: 000000.jpg Time taken for inference is 9.37288 ms. Det count is 0 Time taken for nms is 0.001386 ms. layer1-conv input reformatter 0 0.038ms layer1-conv 0.130ms layer1-act 0.134ms layer2-maxpool 0.093ms layer3-conv input reformatter 0 0.037ms layer3-conv 0.071ms layer3-act 0.069ms layer4-maxpool 0.049ms layer5-conv input reformatter 0 0.020ms layer5-conv 0.046ms layer5-act 0.036ms layer6-maxpool 0.030ms layer7-conv input reformatter 0 0.011ms layer7-conv 0.038ms layer7-act 0.017ms layer8-maxpool 0.016ms layer9-conv input reformatter 0 0.006ms layer9-conv 0.046ms layer9-act 0.006ms layer10-maxpool 0.007ms layer11-conv input reformatter 0 0.004ms layer11-conv 0.070ms layer11-act 0.004ms layer12-maxpool 0.010ms layer13-conv input reformatter 0 0.005ms layer13-conv 0.137ms layer13-act input reformatter 0 0.011ms layer13-act 0.005ms layer14-conv input reformatter 0 0.008ms layer14-conv 0.038ms layer14-act 0.004ms layer14-act output reformatter 0 0.005ms layer15-conv 0.053ms layer15-act 0.004ms layer16-conv 0.016ms layer14-conv copy 0.006ms layer19-conv 0.012ms layer19-act 0.003ms layer20-upsample 0.010ms layer20-upsample copy 0.007ms layer9-conv copy 0.010ms layer22-conv 0.099ms layer22-act 0.006ms layer23-conv 0.015ms layer22-conv copy 0.009ms layer26-conv 0.023ms layer26-act 0.005ms layer27-upsample 0.029ms layer27-upsample copy 0.019ms layer7-conv copy 0.018ms layer29-conv 0.104ms layer29-act 0.017ms layer30-conv 0.031ms yolo-det 6.559ms Time over all layers: 8.263

Hi, why did your initial tiny model not work? I just converted the official yolov3-tiny.cfg/weights and made the change in YoloConfig.h like this: //YOLO 416

YoloKernel yolo1 = { 13, 13, {81,82, 135,169, 344,319} }; YoloKernel yolo2 = { 26, 26, {10,14, 23,27, 37,58} }; but it doesn't give any detections? Can you share some your experiences?

prajwaljpj commented 5 years ago

@aditbhrgv You just need do something if want use other yolo model:
1. Edit file .cfg

2. Change numclass

3. Edit YoloKernel with filter size match with anchor box

4. Check your model again.

@cong235 Could you please explain what do you mean by step 3. filter size match with anchor box Where do I get the filter size from my yolov3.cfg file? Also I have 9 pairs of anchor points, how do I change my YoloKernel? My config: Num_classes = 9 filters: 42 (original yolov3 = 255) anchors = 11.3950,25.3481, 21.0826,48.1415, 29.8289,76.1553, 35.8586,132.5984, 66.4218,89.7861, 92.6243,139.2145, 164.5912,141.0014, 140.5117,216.2245, 238.7294,323.5685 Could you please show me an example?

lewes6369 commented 5 years ago

@prajwaljpj Hi, What is your kernel feature size? if 416 input, try this: YoloKernel yolo1 = { 13, 13, {164.5912,141.0014, 140.5117,216.2245, 238.7294,323.5685} }; YoloKernel yolo2 = { 26, 26, {35.8586,132.5984, 66.4218,89.7861, 92.6243,139.2145} }; YoloKernel yolo3 = { 52, 52, {11.3950,25.3481, 21.0826,48.1415, 29.8289,76.1553} }; and make sure the anchors are below to according kernel.

prajwaljpj commented 5 years ago

@lewes6369 Thanks for the quick reply. Yes the kernel feature size is 416 and It worked!

liteonandy commented 4 years ago

@cong235 Hello, My yolov3-tiny always reports an error: ####### input args####### C=3; H=416; W=416; caffemodel=./yolov3-tiny.caffemodel; calib=; cam=0; class=80; classname=coco.name; display=1; evallist=; input=0; inputstream=cam; mode=fp32; nms=0.450000; outputs=yolo-det; prototxt=./yolov3-tiny-trt.prototxt; savefile=result; saveimg=0; videofile=sample.mp4; ####### end args####### init plugin proto: ./yolov3-tiny-trt.prototxt caffemodel: ./yolov3-tiny.caffemodel Begin parsing model... ERROR: layer21-route: all concat input tensors must have the same dimensions except on the concatenation axis runYolov3: ./parserHelper.h:97: nvinfer1::DimsCHW parserhelper::getCHW(const nvinfer1::Dims&): Assertion `d.nbDims >= 3' failed. Aborted (core dumped)

Can you give me your yolov3-tiny?

liteonandy commented 4 years ago

@cong235 My yolov3-tiny-trt.prototxt: name: "Darkent2Caffe" input: "data" input_dim: 1 input_dim: 3 input_dim: 416 input_dim: 416

layer { bottom: "data" top: "layer1-conv" name: "layer1-conv" type: "Convolution" convolution_param { num_output: 16 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer1-conv" top: "layer1-conv" name: "layer1-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer1-conv" top: "layer2-maxpool" name: "layer2-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer2-maxpool" top: "layer3-conv" name: "layer3-conv" type: "Convolution" convolution_param { num_output: 32 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer3-conv" top: "layer3-conv" name: "layer3-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer3-conv" top: "layer4-maxpool" name: "layer4-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer4-maxpool" top: "layer5-conv" name: "layer5-conv" type: "Convolution" convolution_param { num_output: 64 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer5-conv" top: "layer5-conv" name: "layer5-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer5-conv" top: "layer6-maxpool" name: "layer6-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer6-maxpool" top: "layer7-conv" name: "layer7-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer7-conv" top: "layer7-conv" name: "layer7-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer7-conv" top: "layer8-maxpool" name: "layer8-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer8-maxpool" top: "layer9-conv" name: "layer9-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer9-conv" top: "layer9-conv" name: "layer9-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer9-conv" top: "layer10-maxpool" name: "layer10-maxpool" type: "Pooling" pooling_param { stride: 2 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer10-maxpool" top: "layer11-conv" name: "layer11-conv" type: "Convolution" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer11-conv" top: "layer11-conv" name: "layer11-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer11-conv" top: "layer12-maxpool" name: "layer12-maxpool" type: "Pooling" pooling_param { stride: 1 pool: MAX kernel_size: 2 pad: 0 } } layer { bottom: "layer12-maxpool" top: "layer13-conv" name: "layer13-conv" type: "Convolution" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer13-conv" top: "layer13-conv" name: "layer13-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer13-conv" top: "layer14-conv" name: "layer14-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer14-conv" top: "layer14-conv" name: "layer14-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer14-conv" top: "layer15-conv" name: "layer15-conv" type: "Convolution" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer15-conv" top: "layer15-conv" name: "layer15-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer15-conv" top: "layer16-conv" name: "layer16-conv" type: "Convolution" convolution_param { num_output: 255 kernel_size: 1 pad: 0 stride: 1 bias_term: true } } layer { bottom: "layer14-conv" top: "layer18-route" name: "layer18-route" type: "Concat" } layer { bottom: "layer18-route" top: "layer19-conv" name: "layer19-conv" type: "Convolution" convolution_param { num_output: 128 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer19-conv" top: "layer19-conv" name: "layer19-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer19-conv" top: "layer20-upsample" name: "layer20-upsample" type: "Upsample"

upsample_param {

#    scale: 2
#}

} layer { bottom: "layer20-upsample" bottom: "layer9-conv" top: "layer21-route" name: "layer21-route" type: "Concat" } layer { bottom: "layer21-route" top: "layer22-conv" name: "layer22-conv" type: "Convolution" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-bn" type: "BatchNorm" batch_norm_param { use_global_stats: true } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-scale" type: "Scale" scale_param { bias_term: true } } layer { bottom: "layer22-conv" top: "layer22-conv" name: "layer22-act" type: "ReLU" relu_param { negative_slope: 0.1 } } layer { bottom: "layer22-conv" top: "layer23-conv" name: "layer23-conv" type: "Convolution" convolution_param { num_output: 255 kernel_size: 1 pad: 0 stride: 1 bias_term: true } } layer { bottom: "layer16-conv" bottom: "layer23-conv" top: "yolo-det" name: "yolo-det" type: "Yolo" }

sandeepjangir07 commented 4 years ago

@cong235 Hello, My yolov3-tiny always reports an error: ####### input args####### C=3; H=416; W=416; caffemodel=./yolov3-tiny.caffemodel; calib=; cam=0; class=80; classname=coco.name; display=1; evallist=; input=0; inputstream=cam; mode=fp32; nms=0.450000; outputs=yolo-det; prototxt=./yolov3-tiny-trt.prototxt; savefile=result; saveimg=0; videofile=sample.mp4; ####### end args####### init plugin proto: ./yolov3-tiny-trt.prototxt caffemodel: ./yolov3-tiny.caffemodel Begin parsing model... ERROR: layer21-route: all concat input tensors must have the same dimensions except on the concatenation axis runYolov3: ./parserHelper.h:97: nvinfer1::DimsCHW parserhelper::getCHW(const nvinfer1::Dims&): Assertion `d.nbDims >= 3' failed. Aborted (core dumped)

Can you give me your yolov3-tiny?

I am having the same issue. I get the error: Begin parsing model... ERROR: layer21-route: all concat input tensors must have the same dimensions except on the concatenation axis (0), but dimensions mismatched at input 1 at index 1. Input 0 shape: [128,24,24], Input 1 shape: [256,26,26] runYolov3: ./parserHelper.h:97: nvinfer1::DimsCHW parserhelper::getCHW(const nvinfer1::Dims&): Assertion `d.nbDims >= 3' failed. Aborted (core dumped)

Can any one help me with this problem ??

uname0x96 commented 4 years ago

@sandeepjangir07 You should try with 1 simple model you get from yolov3 homepage. And please confirm your yolov3-tiny-trt.prototxt was added new layer 'yolo'

zeyuDai2018 commented 4 years ago

@cong235 Hi, you mentioned that we can run int8 mode on tx2 and it's faster than fp16,However I set the mode to int8 and caliberate with 30 pictures, I got almost the same speed on xavier for yolo-tiny (about 85fps). I want to know why and is there a way to force the weights to int8 without caliberation? I can see that the int8 engine is just a little smaller than fp16 engine which means there's nearely no difference between my engines.

uname0x96 commented 4 years ago

@zeyuDai2018 Yes, that is just in your engine, not another bro. Ex: You can processing 2 operation bellow: If you have 2 nodes with weight: 2.0 and 8.0 FP16: 2.0 8.0 = 16.0 INT8: 2 8 = 16 The result is the same.

But if your weight is 2.1 and 8.1: FP16: 2.1 8.1 = 17.01 INT8: 2 8 = 16 The result is difference.

I mean, INT8 is lighter than FP16 but it's not mean INT8 faster than FP16. Faster or not is dependent on your model and your weight.

zeyuDai2018 commented 4 years ago

@cong235 Hi : Agreed on what you said but I noticed this paper recently https://arxiv.org/abs/1904.02024, It claims he can reach 12ms lantency on xavier for yolo3 int8 with ReLu as activation which is almost 70 percent faster than using leakey ReLu. You can also check this link https://github.com/AlexeyAB/darknet/issues/4538.I believe the yolo-tiny is several times faster than yolo3, but I can only reach 90-100 fps for my int8 engine. I wonder if you have any idea or interest in further dig in the acceleartion on this yolo-tiny. Thanks for replying!

uname0x96 commented 4 years ago

@zeyuDai2018 I'm sorry. I can't explain it to you. Because in TensorRT not simply just convert only weight to INT8 or FP16. They have 2 things called: Layer and Tensor Fusion, kernel auto-turning. These codes will change your network from a complex model to a simple model. And I cant see exactly what are they doing there.

zeyuDai2018 commented 4 years ago

@zeyuDai2018 I'm sorry. I can't explain it to you. Because in TensorRT not simply just convert only weight to INT8 or FP16. They have 2 things called: Layer and Tensor Fusion, kernel auto-turning. These codes will change your network from a complex model to a simple model. And I cant see exactly what are they doing there.

Hi! Thanks for replying. I think this might be caused by the custom upsample layer and the leakeyRelu layer which blocked some layer fusion . Maybe alter this upsample layer with deconvolution in caffe would enable the caffeparser to parse the model into a more efficient way automatically. I will try to implenent this after new year.

uname0x96 commented 4 years ago

@zeyuDai2018 I'm sorry. I can't explain it to you. Because in TensorRT not simply just convert only weight to INT8 or FP16. They have 2 things called: Layer and Tensor Fusion, kernel auto-turning. These codes will change your network from a complex model to a simple model. And I cant see exactly what are they doing there.

Hi! Thanks for replying. I think this might be caused by the custom upsample layer and the leakeyRelu layer which blocked some layer fusion . Maybe alter this upsample layer with deconvolution in caffe would enable the caffeparser to parse the model into a more efficient way automatically. I will try to implenent this after new year.

ya, it's so hard. But in the real-life, I'm not using INT8 or FP16 im just using pb model and optimize pb model. it's enough for speedy and accuracy

lewes6369 / TensorRT-Yolov3

[Training] Can you give me the github for model train on caffe #33

the bottoms are the yolo input layers

upsample_param {

upsample_param {

upsample_param {