xingwangsfu / caffe-yolo

YOLO (Real-Time Object Detection) in caffe
520 stars 336 forks source link

Can it convert yolo's version 2 (.weights) files to caffemodel files? #15

Open AlexeyAB opened 7 years ago

AlexeyAB commented 7 years ago

Now there is Yolo v2 by link: http://pjreddie.com/darknet/yolo/ And old Yolo v1 is here: http://pjreddie.com/darknet/yolov1/ Can the caffe-yolo convert yolo's version 2 (.weights) files to caffemodel files by using create_yolo_caffemodel.py? Or will it possible later?

xingwangsfu commented 7 years ago

This should be doable, but I think there are some layers that are not supported by caffe, e.g., [route] and [reorg]. You may need to dig into yolo source code and try to implement these layers in caffe. Currently, I don't have free hands to do this. You are welcome to contribute.

Jumabek commented 7 years ago

The author of YOLO v2 says, route + reorg only adds 1% improvement. I think main game changer is Region Layers Should you add this layer, that would be awesome. I will try myself too, but not sure with the result.

Jumabek commented 7 years ago

https://groups.google.com/forum/#!topic/darknet/oTrnrO3xv6o

tk2github commented 7 years ago

Hi, We are also stuck on the same issue. Were you able to try this out? If so, can you please help us with that layer?

Jumabek commented 7 years ago

Hey,

In order to use YOLOv2 on caffe, we need region layer (implement as python layer), because of small computation it doesn't need CUDA , just cpu is fine for region layer. That's what author has done as well.

However, that layer is not simple. Even though theory was simple, its implementations looks very complicated to me https://github.com/Jumabek/darknet/blob/master/src/region_layer.c

Unfortunately, I gave up on half way. Cuz right now working on another project and really enjoying using YOLO for my projects.

SHaiHosh commented 7 years ago

you have to compile the released version in: https://github.com/gklz1982/caffe-yolov2 Then, there is an error in the released prototxt file see the corrected prototxt:

name: "YOLONET" layer { name: "data" type: "Input" top: "data" top: "label" input_param { shape: { dim: 1 dim: 3 dim: 416 dim: 416 } shape: { dim: 1 dim: 1 dim: 30 dim: 5 } } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" convolution_param { num_output: 32 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn1" type: "BatchNorm" bottom: "conv1" top: "bn1" } layer { name: "scale1" type: "Scale" bottom: "bn1" top: "scale1" scale_param { bias_term: true } } layer { name: "relu1" type: "ReLU" bottom: "scale1" top: "scale1" relu_param{ negative_slope: 0.1 }
} layer { name: "pool1" type: "Pooling" bottom: "scale1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer{ name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" convolution_param { num_output: 64 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn2" type: "BatchNorm" bottom: "conv2" top: "bn2" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale2" type: "Scale" bottom: "bn2" top: "scale2" scale_param { bias_term: true } } layer { name: "relu2" type: "ReLU" bottom: "scale2" top: "scale2" relu_param{ negative_slope: 0.1 }
} layer { name: "pool2" type: "Pooling" bottom: "scale2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } }

layer{ name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" convolution_param { num_output: 128 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn3" type: "BatchNorm" bottom: "conv3" top: "bn3" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale3" type: "Scale" bottom: "bn3" top: "scale3" scale_param { bias_term: true } } layer { name: "relu3" type: "ReLU" bottom: "scale3" top: "scale3" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv4" type: "Convolution" bottom: "scale3" top: "conv4" convolution_param { num_output: 64 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "bn4" type: "BatchNorm" bottom: "conv4" top: "bn4" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale4" type: "Scale" bottom: "bn4" top: "scale4" scale_param { bias_term: true } } layer { name: "relu4" type: "ReLU" bottom: "scale4" top: "scale4" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv5" type: "Convolution" bottom: "scale4" top: "conv5" convolution_param { num_output: 128 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn5" type: "BatchNorm" bottom: "conv5" top: "bn5" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale5" type: "Scale" bottom: "bn5" top: "scale5" scale_param { bias_term: true } } layer { name: "relu5" type: "ReLU" bottom: "scale5" top: "scale5" relu_param{ negative_slope: 0.1 }
} layer { name: "pool5" type: "Pooling" bottom: "scale5" top: "pool5" pooling_param { pool: MAX kernel_size: 2 stride: 2 } }

layer{ name: "conv6" type: "Convolution" bottom: "pool5" top: "conv6" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn6" type: "BatchNorm" bottom: "conv6" top: "bn6" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale6" type: "Scale" bottom: "bn6" top: "scale6" scale_param { bias_term: true } } layer { name: "relu6" type: "ReLU" bottom: "scale6" top: "scale6" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv7" type: "Convolution" bottom: "scale6" top: "conv7" convolution_param { num_output: 128 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "bn7" type: "BatchNorm" bottom: "conv7" top: "bn7" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale7" type: "Scale" bottom: "bn7" top: "scale7" scale_param { bias_term: true } } layer { name: "relu7" type: "ReLU" bottom: "scale7" top: "scale7" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv8" type: "Convolution" bottom: "scale7" top: "conv8" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn8" type: "BatchNorm" bottom: "conv8" top: "bn8" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale8" type: "Scale" bottom: "bn8" top: "scale8" scale_param { bias_term: true } } layer { name: "relu8" type: "ReLU" bottom: "scale8" top: "scale8" relu_param{ negative_slope: 0.1 }
} layer { name: "pool8" type: "Pooling" bottom: "scale8" top: "pool8" pooling_param { pool: MAX kernel_size: 2 stride: 2 } }

layer{ name: "conv9" type: "Convolution" bottom: "pool8" top: "conv9" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn9" type: "BatchNorm" bottom: "conv9" top: "bn9" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale9" type: "Scale" bottom: "bn9" top: "scale9" scale_param { bias_term: true } } layer { name: "relu9" type: "ReLU" bottom: "scale9" top: "scale9" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv10" type: "Convolution" bottom: "scale9" top: "conv10" convolution_param { num_output: 256 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "bn10" type: "BatchNorm" bottom: "conv10" top: "bn10" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale10" type: "Scale" bottom: "bn10" top: "scale10" scale_param { bias_term: true } } layer { name: "relu10" type: "ReLU" bottom: "scale10" top: "scale10" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv11" type: "Convolution" bottom: "scale10" top: "conv11" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn11" type: "BatchNorm" bottom: "conv11" top: "bn11" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale11" type: "Scale" bottom: "bn11" top: "scale11" scale_param { bias_term: true } } layer { name: "relu11" type: "ReLU" bottom: "scale11" top: "scale11" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv12" type: "Convolution" bottom: "scale11" top: "conv12" convolution_param { num_output: 256 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "bn12" type: "BatchNorm" bottom: "conv12" top: "bn12" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale12" type: "Scale" bottom: "bn12" top: "scale12" scale_param { bias_term: true } } layer { name: "relu12" type: "ReLU" bottom: "scale12" top: "scale12" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv13" type: "Convolution" bottom: "scale12" top: "conv13" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn13" type: "BatchNorm" bottom: "conv13" top: "bn13" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale13" type: "Scale" bottom: "bn13" top: "scale13" scale_param { bias_term: true } } layer { name: "relu13" type: "ReLU" bottom: "scale13" top: "scale13" relu_param{ negative_slope: 0.1 }
} layer { name: "pool13" type: "Pooling" bottom: "scale13" top: "pool13" pooling_param { pool: MAX kernel_size: 2 stride: 2 } }

layer{ name: "conv14" type: "Convolution" bottom: "pool13" top: "conv14" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn14" type: "BatchNorm" bottom: "conv14" top: "bn14" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale14" type: "Scale" bottom: "bn14" top: "scale14" scale_param { bias_term: true } } layer { name: "relu14" type: "ReLU" bottom: "scale14" top: "scale14" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv15" type: "Convolution" bottom: "scale14" top: "conv15" convolution_param { num_output: 512 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "bn15" type: "BatchNorm" bottom: "conv15" top: "bn15" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale15" type: "Scale" bottom: "bn15" top: "scale15" scale_param { bias_term: true } } layer { name: "relu15" type: "ReLU" bottom: "scale15" top: "scale15" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv16" type: "Convolution" bottom: "scale15" top: "conv16" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn16" type: "BatchNorm" bottom: "conv16" top: "bn16" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale16" type: "Scale" bottom: "bn16" top: "scale16" scale_param { bias_term: true } } layer { name: "relu16" type: "ReLU" bottom: "scale16" top: "scale16" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv17" type: "Convolution" bottom: "scale16" top: "conv17" convolution_param { num_output: 512 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "bn17" type: "BatchNorm" bottom: "conv17" top: "bn17" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale17" type: "Scale" bottom: "bn17" top: "scale17" scale_param { bias_term: true } } layer { name: "relu17" type: "ReLU" bottom: "scale17" top: "scale17" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv18" type: "Convolution" bottom: "scale17" top: "conv18" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn18" type: "BatchNorm" bottom: "conv18" top: "bn18" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale18" type: "Scale" bottom: "bn18" top: "scale18" scale_param { bias_term: true } } layer { name: "relu18" type: "ReLU" bottom: "scale18" top: "scale18" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv19" type: "Convolution" bottom: "scale18" top: "conv19" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn19" type: "BatchNorm" bottom: "conv19" top: "bn19" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale19" type: "Scale" bottom: "bn19" top: "scale19" scale_param { bias_term: true } } layer { name: "relu19" type: "ReLU" bottom: "scale19" top: "scale19" relu_param{ negative_slope: 0.1 }
}

layer{ name: "conv20" type: "Convolution" bottom: "scale19" top: "conv20" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false }

} layer { name: "bn20" type: "BatchNorm" bottom: "conv20" top: "bn20" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale20" type: "Scale" bottom: "bn20" top: "scale20" scale_param { bias_term: true } } layer { name: "relu20" type: "ReLU" bottom: "scale20" top: "scale20" relu_param { negative_slope: 0.1 }
}

layer { name: "concat1" type: "Concat" bottom: "scale13" top: "concat1" }

layer { name: "conv21" type: "Convolution" bottom: "concat1" top: "conv21" convolution_param { num_output: 64 kernel_size: 1 stride: 1 pad: 0 bias_term: false } } layer { name: "bn21" type: "BatchNorm" bottom: "conv21" top: "bn21" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "reorg1" type: "Reorg" bottom: "bn21" top: "reorg1" reorg_param { stride: 2 } }

layer { name: "concat2" type: "Concat" bottom: "reorg1" bottom: "scale20" top: "concat2" }

layer{ name: "conv22" type: "Convolution" bottom: "concat2" top: "conv22" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "bn22" type: "BatchNorm" bottom: "conv22" top: "bn22" param { lr_mult: 0 } param { lr_mult: 0 } param { lr_mult: 0 } } layer { name: "scale22" type: "Scale" bottom: "bn22" top: "scale22" scale_param { bias_term: true } } layer { name: "relu22" type: "ReLU" bottom: "scale22" top: "scale22" relu_param{ negative_slope: 0.1 }
}

layer { name: "conv23" type: "Convolution" bottom: "scale22" top: "conv23" convolution_param { num_output: 125 kernel_size: 1 stride: 1 pad: 0 } } layer { name: "relu23" type: "ReLU" bottom: "conv23" top: "conv23" } layer { name: "region1" type: "RegionLoss" bottom: "conv23" bottom: "label" top: "region1" region_loss_param { side: 13 num_class: 20 coords: 4 num: 5 } }

layer { name: "detection_out" type: "DetectionOutput" bottom: "conv23" top: "detection_out" include { phase: TEST } detection_output_param { num_classes: 9 coords: 4 confidence_threshold: 0.01 biases: 0.738768 biases: 0.874946 biases: 2.42204 biases: 2.65704 biases: 4.30971 biases: 7.04493 biases: 10.246 biases: 4.59428 biases: 12.6868 biases: 11.8741 } } layer { name: "detection_eval" type: "DetectionEvaluate" bottom: "detection_out" bottom: "label" top: "detection_eval" include { phase: TEST } detection_evaluate_param { num_classes: 9 overlap_threshold: 0.5
} }

http://ethereon.github.io/netscope/#/gist/9640ecb59a75f230446e7c70d2f8bcf3

xzhangxa commented 7 years ago

@SHaiHosh Hi, thanks for your prototxt! Using the script that project provides it can convert yolo-voc.weights to caffemodel, I'll try to make it work for yolo.prototxt accordingly. BTW have you tried to train yolo2 on Caffe on that project? Does it work fine or not?

SHaiHosh commented 7 years ago

Yes it works The only issue is the normalization. the released code makes the values of the image be between -128 to 127 after I fixed it to be in the range of 0-1 there were no problems

xzhangxa commented 7 years ago

@SHaiHosh Thanks! I'll try it.

xyxxyx commented 6 years ago

@SHaiHosh Hi, SHaiHosh. After normalization , have you test the yolov2 of caffe version on any benchmark datasets? How dose it work?

SHaiHosh commented 6 years ago

yes i have tested the yolov2 caffe version no difference than the reported results then i transferred the network for my purposes so i dont use the original model anymore

duangenquan commented 6 years ago

Regionlayer is re-implemented in this repo in c/c++, along with a python wrapper. Have fun!

skyw commented 6 years ago

@SHaiHosh I can convert yolo-voc.weights by the prototxt you shared, thanks. But how do you test it? How do you calculate mAP? I tried the test_yolo_v2.py but it doesn't seem to give correct boxes. Do you have your own code to run test and calculate mAP? if so, could you share them?

ysh329 commented 6 years ago

Convert darknet yolov2 model to caffe · Issue #24 · ysh329/deep-learning-model-convertor https://github.com/ysh329/deep-learning-model-convertor/issues/24

appusom commented 6 years ago

@SHaiHosh Thank you for the prototxt, I was able to create the caffemodel using it. Thereafter I modified the test_yolo_v2.py to point to your prototxt and the caffemodel and tried running it. But i am now getting a segmentation fault. Did the script run for you? I0206 03:38:46.925810 11535 net.cpp:261] This network produces output detection_eval I0206 03:38:46.925814 11535 net.cpp:261] This network produces output region1 I0206 03:38:46.925858 11535 net.cpp:274] Network initialization done. 1 /usr/local/lib/python2.7/dist-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15. warn("The default mode, 'constant', will be changed to 'reflect' in " Segmentation fault (core dumped)

SHaiHosh commented 6 years ago

which GPU do you have? Anyway, tried rebooting your system

appusom commented 6 years ago

@SHaiHosh I using a CPU only system. Interestingly i tried the train_lenet.sh script and it also met with a core dump. Do you have any idea what might be happening? F0207 00:48:04.760711 16018 solver.cpp:374] Check failed: result[j]->width() == 5 (1 vs. 5) Check failure stack trace: @ 0x7f4c86c0d5cd google::LogMessage::Fail() @ 0x7f4c86c0f433 google::LogMessage::SendToLog() @ 0x7f4c86c0d15b google::LogMessage::Flush() @ 0x7f4c86c0fe1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f4c87031d94 caffe::Solver<>::Test() @ 0x7f4c87032ade caffe::Solver<>::TestAll() @ 0x7f4c87032bfc caffe::Solver<>::Step() @ 0x7f4c8703377e caffe::Solver<>::Solve() @ 0x40cf26 train() @ 0x4081fd main @ 0x7f4c856c1830 __libc_start_main @ 0x408a79 _start @ (nil) (unknown) Aborted (core dumped)

dedoogong commented 6 years ago

that error occurs because your prototxt doesn't have biases or you get the result before detection output such as conv22.

@SHaiHosh I could run test_yolo_v2.py and test_out_put/test_eval.py with converted model. I see that in test_yolo_v2.py, author implemented get_region_boxes and nms part on his own in python, and that's why it gets the result from the last conv layer instead of detection out layer.

In test_out_put/test_eval.py, it just interpret the detection results.

But in all cases, I failed to get the correct bboxs(too many boxes with just spread over the image). The output values of net.forward(), the values look quite weird. output['detection_out'][0] -> min == 0.0, max == 2.8e+14!!! super big!

but actually in case of the darknet's yolov2(coco-608x608), the min/max values of the outputs of net->predict() are MIN : -2.505956 MAX : 2.043313. This proves the converted model works totally wrongly.

I will keep trace the reason for this wrong result of darknet yolov2...

Did you use your own code? or just modified those code? your hint must be really helpful and save my time aaaa looooot. Thank you very much!!

dedoogong commented 6 years ago

I found a bug in this repo regarding scale/bias for conv21. I'm trying to fix it. but after fixing it manually(hardcode some values), I coudn't get the good detection results even though better than before(that was really terrible....). I'm seeing a light more and more..... I really wonder how other guys can successfully run this repo withought modification;;;; there are many bugs that must occur to everyone!

dedoogong commented 6 years ago

well, test_yolo_v2.py works finally well but little bit slower than original darknet-yolov2. caffe-yolov2 takes around 0.07 to 0.08 sec per 1 image (I tested it with person.jpg example image) darknet-yolov2 takes around 0.06 to 0.07 sec per 1 image.

but with caffe, I can extend yolov2 with other DCNN's much more and optimize more easily! so it's wonderful.

dog_results eagle_results girraffe_results horses_results person_results

after NMS

dog_results eagle_results girraffe_results horses_results person_results

ChriswooTalent commented 6 years ago

Hi: @dedoogong I have converted the yolo.weight file to the yolo.caffemodel successfully, and I changed the prototxt to the deploy version to test single image, I met the same problem that the yolov2_net Predict too many boxes after nms, how did you solve this problem when you first encountered! Thank you! I need your help!

ChriswooTalent commented 6 years ago

@dedoogong How did you solve the problem that the Net detected too many boxes?

ChriswooTalent commented 6 years ago

Finally,I have solved the problem, In the Protofile provided by @SHaiHosh,Firstly,I found a bug that theres is no scale_layer followed the layer "bn21", I added a scale layer and relu layer,;Secondly, the last layer conv23 should not be followed by a relu layer;When you fixed these two bug, use the new protofile to get the yolo.caffemodel by the yolo-voc.weight, then you can use the yolo.caffemodel to do some test, this is my result yolotestresult

META-DREAMER commented 6 years ago

@ChriswooTalent Could you share your fixed version of the prototxt file?

ChriswooTalent commented 6 years ago

@hammadj Now, I am sorting my code and file, and I will share them on my github this week!

ghost commented 6 years ago

@ChriswooTalent / @dedoogong it would be nice if you could share the code for those of us who'd just like to try it out. Thanks in advance!!

bharathbv commented 6 years ago

@ChriswooTalent /@dedoogong , I am trying caffe-yolov2 and not been successful.

  1. After copying @SHaiHosh modified prototxt, I still have issues converting yolo v2 weights to caffemodel. examples/indoor/convert# python convert_weights_to_caffemodel.py [libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 1015:15: Message type "caffe.LayerParameter" has no field named "reorg_param".

Fix: Using darknet2Caffe.py (https://github.com/marvis/pytorch-caffe-darknet-convert) I got converted prototxt and caffemodel.

name: "yolov2"

layer { name: "data" type: "Input" top: "data" input_param { shape { dim: 1 dim: 3 dim: 416 dim: 416 }
} }

layer { name: "layer1_conv" type: "Convolution" bottom: "data" top: "layer1_conv" convolution_param { num_output: 32 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer1_bn" type: "BatchNorm" bottom: "layer1_conv" top: "layer1_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer1_scale" type: "Scale" bottom: "layer1_conv" top: "layer1_conv" scale_param { bias_term: true } } layer { name: "layer1_act" type: "ReLU" bottom: "layer1_conv" top: "layer1_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer2_maxpool" type: "Pooling" bottom: "layer1_conv" top: "layer2_maxpool" pooling_param { kernel_size: 2 stride: 2 pool: MAX } } layer { name: "layer3_conv" type: "Convolution" bottom: "layer2_maxpool" top: "layer3_conv" convolution_param { num_output: 64 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer3_bn" type: "BatchNorm" bottom: "layer3_conv" top: "layer3_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer3_scale" type: "Scale" bottom: "layer3_conv" top: "layer3_conv" scale_param { bias_term: true } } layer { name: "layer3_act" type: "ReLU" bottom: "layer3_conv" top: "layer3_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer4_maxpool" type: "Pooling" bottom: "layer3_conv" top: "layer4_maxpool" pooling_param { kernel_size: 2 stride: 2 pool: MAX } } layer { name: "layer5_conv" type: "Convolution" bottom: "layer4_maxpool" top: "layer5_conv" convolution_param { num_output: 128 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer5_bn" type: "BatchNorm" bottom: "layer5_conv" top: "layer5_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer5_scale" type: "Scale" bottom: "layer5_conv" top: "layer5_conv" scale_param { bias_term: true } } layer { name: "layer5_act" type: "ReLU" bottom: "layer5_conv" top: "layer5_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer6_conv" type: "Convolution" bottom: "layer5_conv" top: "layer6_conv" convolution_param { num_output: 64 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "layer6_bn" type: "BatchNorm" bottom: "layer6_conv" top: "layer6_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer6_scale" type: "Scale" bottom: "layer6_conv" top: "layer6_conv" scale_param { bias_term: true } } layer { name: "layer6_act" type: "ReLU" bottom: "layer6_conv" top: "layer6_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer7_conv" type: "Convolution" bottom: "layer6_conv" top: "layer7_conv" convolution_param { num_output: 128 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer7_bn" type: "BatchNorm" bottom: "layer7_conv" top: "layer7_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer7_scale" type: "Scale" bottom: "layer7_conv" top: "layer7_conv" scale_param { bias_term: true } } layer { name: "layer7_act" type: "ReLU" bottom: "layer7_conv" top: "layer7_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer8_maxpool" type: "Pooling" bottom: "layer7_conv" top: "layer8_maxpool" pooling_param { kernel_size: 2 stride: 2 pool: MAX } } layer { name: "layer9_conv" type: "Convolution" bottom: "layer8_maxpool" top: "layer9_conv" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer9_bn" type: "BatchNorm" bottom: "layer9_conv" top: "layer9_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer9_scale" type: "Scale" bottom: "layer9_conv" top: "layer9_conv" scale_param { bias_term: true } } layer { name: "layer9_act" type: "ReLU" bottom: "layer9_conv" top: "layer9_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer10_conv" type: "Convolution" bottom: "layer9_conv" top: "layer10_conv" convolution_param { num_output: 128 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "layer10_bn" type: "BatchNorm" bottom: "layer10_conv" top: "layer10_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer10_scale" type: "Scale" bottom: "layer10_conv" top: "layer10_conv" scale_param { bias_term: true } } layer { name: "layer10_act" type: "ReLU" bottom: "layer10_conv" top: "layer10_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer11_conv" type: "Convolution" bottom: "layer10_conv" top: "layer11_conv" convolution_param { num_output: 256 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer11_bn" type: "BatchNorm" bottom: "layer11_conv" top: "layer11_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer11_scale" type: "Scale" bottom: "layer11_conv" top: "layer11_conv" scale_param { bias_term: true } } layer { name: "layer11_act" type: "ReLU" bottom: "layer11_conv" top: "layer11_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer12_maxpool" type: "Pooling" bottom: "layer11_conv" top: "layer12_maxpool" pooling_param { kernel_size: 2 stride: 2 pool: MAX } } layer { name: "layer13_conv" type: "Convolution" bottom: "layer12_maxpool" top: "layer13_conv" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer13_bn" type: "BatchNorm" bottom: "layer13_conv" top: "layer13_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer13_scale" type: "Scale" bottom: "layer13_conv" top: "layer13_conv" scale_param { bias_term: true } } layer { name: "layer13_act" type: "ReLU" bottom: "layer13_conv" top: "layer13_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer14_conv" type: "Convolution" bottom: "layer13_conv" top: "layer14_conv" convolution_param { num_output: 256 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "layer14_bn" type: "BatchNorm" bottom: "layer14_conv" top: "layer14_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer14_scale" type: "Scale" bottom: "layer14_conv" top: "layer14_conv" scale_param { bias_term: true } } layer { name: "layer14_act" type: "ReLU" bottom: "layer14_conv" top: "layer14_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer15_conv" type: "Convolution" bottom: "layer14_conv" top: "layer15_conv" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer15_bn" type: "BatchNorm" bottom: "layer15_conv" top: "layer15_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer15_scale" type: "Scale" bottom: "layer15_conv" top: "layer15_conv" scale_param { bias_term: true } } layer { name: "layer15_act" type: "ReLU" bottom: "layer15_conv" top: "layer15_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer16_conv" type: "Convolution" bottom: "layer15_conv" top: "layer16_conv" convolution_param { num_output: 256 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "layer16_bn" type: "BatchNorm" bottom: "layer16_conv" top: "layer16_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer16_scale" type: "Scale" bottom: "layer16_conv" top: "layer16_conv" scale_param { bias_term: true } } layer { name: "layer16_act" type: "ReLU" bottom: "layer16_conv" top: "layer16_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer17_conv" type: "Convolution" bottom: "layer16_conv" top: "layer17_conv" convolution_param { num_output: 512 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer17_bn" type: "BatchNorm" bottom: "layer17_conv" top: "layer17_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer17_scale" type: "Scale" bottom: "layer17_conv" top: "layer17_conv" scale_param { bias_term: true } } layer { name: "layer17_act" type: "ReLU" bottom: "layer17_conv" top: "layer17_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer18_maxpool" type: "Pooling" bottom: "layer17_conv" top: "layer18_maxpool" pooling_param { kernel_size: 2 stride: 2 pool: MAX } } layer { name: "layer19_conv" type: "Convolution" bottom: "layer18_maxpool" top: "layer19_conv" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer19_bn" type: "BatchNorm" bottom: "layer19_conv" top: "layer19_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer19_scale" type: "Scale" bottom: "layer19_conv" top: "layer19_conv" scale_param { bias_term: true } } layer { name: "layer19_act" type: "ReLU" bottom: "layer19_conv" top: "layer19_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer20_conv" type: "Convolution" bottom: "layer19_conv" top: "layer20_conv" convolution_param { num_output: 512 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "layer20_bn" type: "BatchNorm" bottom: "layer20_conv" top: "layer20_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer20_scale" type: "Scale" bottom: "layer20_conv" top: "layer20_conv" scale_param { bias_term: true } } layer { name: "layer20_act" type: "ReLU" bottom: "layer20_conv" top: "layer20_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer21_conv" type: "Convolution" bottom: "layer20_conv" top: "layer21_conv" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer21_bn" type: "BatchNorm" bottom: "layer21_conv" top: "layer21_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer21_scale" type: "Scale" bottom: "layer21_conv" top: "layer21_conv" scale_param { bias_term: true } } layer { name: "layer21_act" type: "ReLU" bottom: "layer21_conv" top: "layer21_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer22_conv" type: "Convolution" bottom: "layer21_conv" top: "layer22_conv" convolution_param { num_output: 512 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "layer22_bn" type: "BatchNorm" bottom: "layer22_conv" top: "layer22_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer22_scale" type: "Scale" bottom: "layer22_conv" top: "layer22_conv" scale_param { bias_term: true } } layer { name: "layer22_act" type: "ReLU" bottom: "layer22_conv" top: "layer22_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer23_conv" type: "Convolution" bottom: "layer22_conv" top: "layer23_conv" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer23_bn" type: "BatchNorm" bottom: "layer23_conv" top: "layer23_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer23_scale" type: "Scale" bottom: "layer23_conv" top: "layer23_conv" scale_param { bias_term: true } } layer { name: "layer23_act" type: "ReLU" bottom: "layer23_conv" top: "layer23_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer24_conv" type: "Convolution" bottom: "layer23_conv" top: "layer24_conv" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer24_bn" type: "BatchNorm" bottom: "layer24_conv" top: "layer24_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer24_scale" type: "Scale" bottom: "layer24_conv" top: "layer24_conv" scale_param { bias_term: true } } layer { name: "layer24_act" type: "ReLU" bottom: "layer24_conv" top: "layer24_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer25_conv" type: "Convolution" bottom: "layer24_conv" top: "layer25_conv" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer25_bn" type: "BatchNorm" bottom: "layer25_conv" top: "layer25_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer25_scale" type: "Scale" bottom: "layer25_conv" top: "layer25_conv" scale_param { bias_term: true } } layer { name: "layer25_act" type: "ReLU" bottom: "layer25_conv" top: "layer25_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer27_conv" type: "Convolution" bottom: "layer17_conv" top: "layer27_conv" convolution_param { num_output: 64 kernel_size: 1 pad: 0 stride: 1 bias_term: false } } layer { name: "layer27_bn" type: "BatchNorm" bottom: "layer27_conv" top: "layer27_conv" batch_norm_param { use_global_stats: true } } layer { name: "layer27_scale" type: "Scale" bottom: "layer27_conv" top: "layer27_conv" scale_param { bias_term: true } } layer { name: "layer27_act" type: "ReLU" bottom: "layer27_conv" top: "layer27_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer28_reorg" type: "Reshape" bottom: "layer27_conv" top: "layer28_reorg" reshape_param { shape { dim: 1 dim: 256 dim: 13 dim: 13 } } } layer { name: "layer29_concat" type: "Concat" bottom: "layer28_reorg" bottom: "layer25_conv" top: "layer29_concat" } layer { name: "layer30_conv" type: "Convolution" bottom: "layer29_concat" top: "layer30_conv" convolution_param { num_output: 1024 kernel_size: 3 pad: 1 stride: 1 bias_term: false } } layer { name: "layer30_bn" type: "BatchNorm" bottom: "layer30_conv" top: "layer30_conv" batch_norm_param { use_global_stats: true }layer31_conv } layer { name: "layer30_scale" type: "Scale" bottom: "layer30_conv" top: "layer30_conv" scale_param { bias_term: true } } layer { name: "layer30_act" type: "ReLU" bottom: "layer30_conv" top: "layer30_conv" relu_param { negative_slope: 0.1 } } layer { name: "layer31_conv" type: "Convolution" bottom: "layer30_conv" top: "layer31_conv" convolution_param { num_output: 425 kernel_size: 1 pad: 0 stride: 1 bias_term: true } } layer { name: "layer32_region" type: "Region" bottom: "layer31_conv" top: "layer32_region" region_param { anchors: "0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828" classes: 80 bias_match: 1 coords: 4 num: 5 softmax: 1 jitter: .3 rescore: 1 object_scale: 5 noobject_scale: 1 class_scale: 1 coord_scale: 1 absolute: 1 thresh: .6 random: 1 nms_thresh: 0.3 background: 0 tree_thresh: 0.5 relative: 1 box_thresh: 0.24 } }

  1. Since the "Region layer is implemented in python code, I removed the last "Region" layer in the above prototxt and modified test_eval.py to make "layer31_conv" as output layer and tried python test_eval.py ~/yolo_images/dog.jpg. The results are just bboxes all around. I am guessing its similar issue described by @dedoogong .

Appreciate any guidance on this.

imbadh commented 6 years ago

@SHaiHosh I tried your prototxt. But there is still a mistake. Which shows : conv23 (125, 1024, 1, 1) (125,) count= 50675955 transFlag = False (50983561,) conv1(conv) bn1(batchnorm) scale1(scale) conv2(conv) bn2(batchnorm) scale2(scale) conv3(conv) bn3(batchnorm) scale3(scale) conv4(conv) bn4(batchnorm) scale4(scale) conv5(conv) bn5(batchnorm) scale5(scale) conv6(conv) bn6(batchnorm) scale6(scale) conv7(conv) bn7(batchnorm) scale7(scale) conv8(conv) bn8(batchnorm) scale8(scale) conv9(conv) bn9(batchnorm) scale9(scale) conv10(conv) bn10(batchnorm) scale10(scale) conv11(conv) bn11(batchnorm) scale11(scale) conv12(conv) bn12(batchnorm) scale12(scale) conv13(conv) bn13(batchnorm) scale13(scale) conv14(conv) bn14(batchnorm) scale14(scale) conv15(conv) bn15(batchnorm) scale15(scale) conv16(conv) bn16(batchnorm) scale16(scale) conv17(conv) bn17(batchnorm) scale17(scale) conv18(conv) bn18(batchnorm) scale18(scale) conv19(conv) bn19(batchnorm) scale19(scale) conv20(conv) bn20(batchnorm) scale20(scale) conv21(conv) bn21(batchnorm) conv22(conv) bn22(batchnorm) scale22(scale) conv23(conv) ERROR: size mismatch: 50676061

Do you know what's wrong? Or May I have your yolov2 wights file?

VivekMaran27 commented 6 years ago

@imbadh : Can you please let me know on how did you fix the issue of size mismatch: 50676061

Update: I was trying to convert yolov2-voc.weights which was giving the error. When I used http://pjreddie.com/media/files/yolo-voc.weightshttp://pjreddie.com/media/files/yolo-voc.weights. Conversion worked okay

VivekMaran27 commented 6 years ago

@appusom May you please let me know how did you resolve the issue of

I0206 03:38:46.925810 11535 net.cpp:261] This network produces output detection_eval
I0206 03:38:46.925814 11535 net.cpp:261] This network produces output region1
I0206 03:38:46.925858 11535 net.cpp:274] Network initialization done.
1
/usr/local/lib/python2.7/dist-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
Segmentation fault (core dumped)

When I did backtrace it seems to crash in get_region_boxes function I believe