Closed ysh329 closed 6 years ago
Of course, I think it's necessary to check (print) network architecture (model config) using darknet running.
root@cde4ed5721f1:~/darknet# ./darknet detect yolo-voc.2.0.cfg yolo-voc_final.weights data/dog.jpg [6/1763]
layer filters size input output
0 conv 32 3 x 3 / 1 288 x 288 x 3 -> 288 x 288 x 32
1 max 2 x 2 / 2 288 x 288 x 32 -> 144 x 144 x 32
2 conv 64 3 x 3 / 1 144 x 144 x 32 -> 144 x 144 x 64
3 max 2 x 2 / 2 144 x 144 x 64 -> 72 x 72 x 64
4 conv 128 3 x 3 / 1 72 x 72 x 64 -> 72 x 72 x 128
5 conv 64 1 x 1 / 1 72 x 72 x 128 -> 72 x 72 x 64
6 conv 128 3 x 3 / 1 72 x 72 x 64 -> 72 x 72 x 128
7 max 2 x 2 / 2 72 x 72 x 128 -> 36 x 36 x 128
8 conv 256 3 x 3 / 1 36 x 36 x 128 -> 36 x 36 x 256
9 conv 128 1 x 1 / 1 36 x 36 x 256 -> 36 x 36 x 128
10 conv 256 3 x 3 / 1 36 x 36 x 128 -> 36 x 36 x 256
11 max 2 x 2 / 2 36 x 36 x 256 -> 18 x 18 x 256
12 conv 512 3 x 3 / 1 18 x 18 x 256 -> 18 x 18 x 512
13 conv 256 1 x 1 / 1 18 x 18 x 512 -> 18 x 18 x 256
14 conv 512 3 x 3 / 1 18 x 18 x 256 -> 18 x 18 x 512
15 conv 256 1 x 1 / 1 18 x 18 x 512 -> 18 x 18 x 256
16 conv 512 3 x 3 / 1 18 x 18 x 256 -> 18 x 18 x 512
17 max 2 x 2 / 2 18 x 18 x 512 -> 9 x 9 x 512
18 conv 1024 3 x 3 / 1 9 x 9 x 512 -> 9 x 9 x1024
19 conv 512 1 x 1 / 1 9 x 9 x1024 -> 9 x 9 x 512
20 conv 1024 3 x 3 / 1 9 x 9 x 512 -> 9 x 9 x1024
21 conv 512 1 x 1 / 1 9 x 9 x1024 -> 9 x 9 x 512
22 conv 1024 3 x 3 / 1 9 x 9 x 512 -> 9 x 9 x1024
23 conv 1024 3 x 3 / 1 9 x 9 x1024 -> 9 x 9 x1024
24 conv 1024 3 x 3 / 1 9 x 9 x1024 -> 9 x 9 x1024
25 route 16
26 reorg / 2 18 x 18 x 512 -> 9 x 9 x2048
27 route 26 24
28 conv 1024 3 x 3 / 1 9 x 9 x3072 -> 9 x 9 x1024
29 conv 75 1 x 1 / 1 9 x 9 x1024 -> 9 x 9 x 75
30 detection
mask_scale: Using default '1.000000'
Loading weights from yolo-voc_final.weights...Done!
data/dog.jpg: Predicted in 3.688960 seconds.
The problem of pytorch-caffe-darknet-convert
caused by the reorg
layer doesn't work as below:
root@c5c10ada8bdb:/home/yuanshuai/code/pytorch-caffe-darknet-convert# python darknet2caffe.py ./models/yolo-voc.2.0.cfg ./models/yolo-voc_final.weights ./models/yolo-voc.2.0.prototxt ./models/yolo-voc_final.caffemodel
unknow layer type reorg
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0115 12:06:52.210939 711 upgrade_proto.cpp:67] Attempting to upgrade input file specified using deprecated input fields: ./models/yolo-voc.2.0.prototxt
I0115 12:06:52.210985 711 upgrade_proto.cpp:70] Successfully upgraded file specified using deprecated input fields.
W0115 12:06:52.210990 711 upgrade_proto.cpp:72] Note that future Caffe releases will only support input layers and not input fields.
I0115 12:06:52.211549 711 net.cpp:51] Initializing net from parameters:
name: "Darkent2Caffe"
.
.
.
.
.
.
I0115 09:30:08.703138 594 net.cpp:406] layer28-concat <- layer17-conv_layer17-act_0_split_1
I0115 09:30:08.703145 594 net.cpp:406] layer28-concat <- layer25-conv
I0115 09:30:08.703155 594 net.cpp:380] layer28-concat -> layer28-concat
F0115 09:30:08.703176 594 concat_layer.cpp:42] Check failed: top_shape[j] == bottom[i]->shape(j) (18 vs. 9) All inputs must have the same shape, except at concat_axis.
*** Check failure stack trace: ***
Aborted (core dumped
Ps: before this issus, I fixed another problem caused by concat
layer name: I changed original code to concat_layer['type'] = 'Concat' # modified by ysh 'concat' to 'Concat'
in darknet2caffe.py:211
.
Check the execution log from darknet, we can find a reorg layer (#25), reshaping the layer (#16) from feature map 18x18x512
to 9x9x2048
. This's a key step for concating --- an important prerequisite is dimension (width and height) of feature map should be same.
Of course, we can find an important execution log: unknow layer type reorg
(first line of execution log).
Pictures above are outputs from two branch before concat
. Due to lack of real convert implementation of reorg
layer (reshape
layer in caffe), two outputs can not concate using same height and width.
Thus, current objects are reshape
(reorg
in darknet) and concate
layers. It's significant to try to fix/support/check reshape
(reorg
) layer implementation in pytorch-caffe-darknet-convert
first.
Same issue pytorch-caffe-darknet-convert/issues/24.
Now I read out the value of reorg
layer in darknet config file as below:
block['type']:reorg
type(block):<class 'collections.OrderedDict'>
block:OrderedDict([('type', 'reorg'), ('stride', '2')])
block[type]: reorg
block[stride]: 2
OrderedDict
is a child type from dict
. The unique difference between them is dict
is out of order but OrderedDict
is in (its insertion) order.
Next is find the responding rule from darknet to caffe. Thus, it's necessary to clear understand the meaning in darknet config file and params. meaning in responding reshape
layer in caffe.
Content below is from caffe/reshape.md at master · BVLC/caffe:
Layer type: Reshape
Implementation: ./src/caffe/layers/reshape_layer.cpp
Input
Output
reshape_param
Sample
layer {
name: "reshape"
type: "Reshape"
bottom: "input"
top: "output"
reshape_param {
shape {
dim: 0 # copy the dimension from below
dim: 2
dim: 3
dim: -1 # infer it from the other dimensions
}
}
}
The Reshape
layer can be used to change the dimensions of its input, without changing its data. Just like the Flatten
layer, only the dimensions are changed; no data is copied in the process.
Output dimensions are specified by the ReshapeParam
proto. Positive numbers are used directly, setting the corresponding dimension of the output blob. In addition, two special values are accepted for any of the target dimension values:
dim: 0
as the 1st target dimension.[]
for MATLAB's reshape: this dimension is calculated to keep the overall element count the same as in the bottom layer. At most one -1 can be used in a reshape operation.As another example, specifying reshape_param { shape { dim: 0 dim: -1 } }
makes the layer behave in exactly the same way as the Flatten
layer.
ReshapeParameter reshape_param
)
shape
./src/caffe/proto/caffe.proto
:{% highlight Protobuf %} {% include proto/ReshapeParameter.txt %} {% endhighlight %}
message ReshapeParameter {
// Specify the output dimensions. If some of the dimensions are set to 0,
// the corresponding dimension from the bottom layer is used (unchanged).
// Exactly one dimension may be set to -1, in which case its value is
// inferred from the count of the bottom blob and the remaining dimensions.
// For example, suppose we want to reshape a 2D blob "input" with shape 2 x 8:
//
// layer {
// type: "Reshape" bottom: "input" top: "output"
// reshape_param { ... }
// }
//
// If "input" is 2D with shape 2 x 8, then the following reshape_param
// specifications are all equivalent, producing a 3D blob "output" with shape
// 2 x 2 x 4:
//
// reshape_param { shape { dim: 2 dim: 2 dim: 4 } }
// reshape_param { shape { dim: 0 dim: 2 dim: 4 } }
// reshape_param { shape { dim: 0 dim: 2 dim: -1 } }
// reshape_param { shape { dim: 0 dim:-1 dim: 4 } }
//
optional BlobShape shape = 1;
// axis and num_axes control the portion of the bottom blob's shape that are
// replaced by (included in) the reshape. By default (axis == 0 and
// num_axes == -1), the entire bottom blob shape is included in the reshape,
// and hence the shape field must specify the entire output shape.
//
// axis may be non-zero to retain some portion of the beginning of the input
// shape (and may be negative to index from the end; e.g., -1 to begin the
// reshape after the last axis, including nothing in the reshape,
// -2 to include only the last axis, etc.).
//
// For example, suppose "input" is a 2D blob with shape 2 x 8.
// Then the following ReshapeLayer specifications are all equivalent,
// producing a blob "output" with shape 2 x 2 x 4:
//
// reshape_param { shape { dim: 2 dim: 2 dim: 4 } }
// reshape_param { shape { dim: 2 dim: 4 } axis: 1 }
// reshape_param { shape { dim: 2 dim: 4 } axis: -3 }
//
// num_axes specifies the extent of the reshape.
// If num_axes >= 0 (and axis >= 0), the reshape will be performed only on
// input axes in the range [axis, axis+num_axes].
// num_axes may also be -1, the default, to include all remaining axes
// (starting from axis).
//
// For example, suppose "input" is a 2D blob with shape 2 x 8.
// Then the following ReshapeLayer specifications are equivalent,
// producing a blob "output" with shape 1 x 2 x 8.
//
// reshape_param { shape { dim: 1 dim: 2 dim: 8 } }
// reshape_param { shape { dim: 1 dim: 2 } num_axes: 1 }
// reshape_param { shape { dim: 1 } num_axes: 0 }
//
// On the other hand, these would produce output blob shape 2 x 1 x 8:
//
// reshape_param { shape { dim: 2 dim: 1 dim: 8 } }
// reshape_param { shape { dim: 1 } axis: 1 num_axes: 0 }
//
optional int32 axis = 2 [default = 0];
optional int32 num_axes = 3 [default = -1];
}
Now I have finished the conversion from darknet model to caffe's using pytorch-caffe-darknet-convert after adding the support of reorg
layer and fixing some issues.
However, the support of reorg
layer has a small problem: you should define the output dimension of reorg
layer. More detailed info. can be found in code darknet2caffe.py
of repo. pytorch-caffe-darknet-convert.
Thank you very much for your contribution,I use your code to normally generate yolov2 network files and models,But in the test, the first layer of batchnorm output Nan. Do you have a test?
@maxadda I checked the correctness of conversion, caffemodel
(conv params etc.) and prototxt
generated by darknet2caffe.py
. The conversion is okay but I found some problems about computation of feature map: the results from darknet
and caffe
are different.
More concretely, I checked the last convolutional result (feature map) and found their feature maps're totally different, not only values but also order of magnitude (darknet
's value're all between 0 and 1, but caffe
's are big 🤣 such as +-10 or bigger ).
Thus, I'm checking the feature map layer by layer from data input now. bless me🤣
Besides, I found there's a key point: In darknet
, the batchnorm
is one param of convolutional layer in cfg
file and if this layer has batchnorm
then this layer will not have (convolutional) bias parameters (I found this point from darknet
source code).
I found a series of pre-processing in darknet. According its process order, I list as below:
HWC
to CHW
RGB
to BGR
Besides, reorg
layer of darknet has different implementation with caffe's Reshape
. reorg
layer of darknet is really odd. Its code is below:
void reorg_cpu(float *x, int w, int h, int c, int batch, int stride, int forward, float *out)
{
int b,i,j,k;
int out_c = c/(stride*stride);
for(b = 0; b < batch; ++b){
for(k = 0; k < c; ++k){
for(j = 0; j < h; ++j){
for(i = 0; i < w; ++i){
int in_index = i + w*(j + h*(k + c*b));
int c2 = k % out_c;
int offset = k / out_c;
int w2 = i*stride + offset % stride;
int h2 = j*stride + offset / stride;
int out_index = w2 + w*stride*(h2 + h*stride*(c2 + out_c*b));
if(forward) out[out_index] = x[in_index];
else out[in_index] = x[out_index];
}
}
}
}
}
@maxadda
Currently, I found it seems exists some parameters (more concretely, they're regression box's biases parameter, note: not convolutional biases) in region layer (in src/region_layer.c
of darknet) of model file but I'm not sure if these biases parameters in model file (weight file) or not, which need to explore darknet code deeper. Afterwards, I found the biases parameters of region layer is anchor values in cfg
file. In other words, the bias parameters are not in model file but cfg
file.
But lately I found these codes from darknet, it seems this code is used to read biases parameter of detection box regression (Of course, I'm sure here is read bias from cfg
file):
char *a = option_find_str(options, "anchors", 0);
if(a){
int len = strlen(a);
int n = 1;
int i;
for(i = 0; i < len; ++i){
if (a[i] == ',') ++n;
}
fprintf(stderr, "==== parse_region ====\n");
for(i = 0; i < n; ++i){
float bias = atof(a);
fprintf(stderr, "%d\t%f\n", i, bias);
l.biases[i] = bias;
a = strchr(a, ',')+1;
}
}
Besides, by reading code, I found whether this region layer has parameter depends on the training setting. Code below is from function layer parse_region(list *options, size_params params)
in parser.c
of darknet.
I'm running an another model named tiny-yolov-voc
as below:
$ ./darknet detect tiny-yolo-voc.cfg tiny-yolo-voc_final.weights data/warship.jpg
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
13 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
14 conv 75 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 75
15 detection
mask_scale: Using default '1.000000'
Besides, I ran another tiny YOLO
as below:
$ ./darknet detector test cfg/voc.data cfg/tiny-yolo-voc.cfg tiny-yolo-voc.weights data/dog.jpg
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
13 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
14 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125
15 detection
mask_scale: Using default '1.000000'
Loading weights from tiny-yolo-voc.weights...Done!
data/dog.jpg: Predicted in 8.385642 seconds.
car: 35%
car: 55%
dog: 78%
bicycle: 36%
We can find the important difference is last pooling, whose stride=1
. It's very particular.
@ysh329 First of all thanks for conversion tool I used your code to convert tiny-yolo-voc.cfg and tiny-yolo-voc.weights to tiny-yolo-voc.prototxt and tiny-yolo-voc.caffemodel and its working fine but when I am trying to convert using yolo-voc.cfg and yolo-voc.weights file to get yolo-voc.caffemodel it is not working and throwing below error
I0206 18:56:52.413803 23711 net.cpp:443] layer28-reshape -> layer28-reshape F0206 18:56:52.413822 23711 reshape_layer.cpp:87] Check failed: top[0]->count() == bottom[0]->count() (165888 vs. 43264) output count must match input count Check failure stack trace
Cfg file I used from here https://github.com/pjreddie/darknet/blob/master/cfg/yolo-voc.cfg and weight file from : https://pjreddie.com/media/files/yolo-voc.weights I would really appreciate any kind of help on this Thank you
@YogeshShitole Do you redefine the output dimension of reorg
layer as below from this code?
# TODO: auto shape infer
shape['dim'] = [1, 2048, 9, 9]
Hi @ysh329 Thanks for quick reply I redefined output dimension of reorg layer from this
shape['dim'] = [1, 2048, 9, 9]
to
shape['dim'] = [1, 64, 26, 26]
and also tried auto shape infer option shape['dim'] = [1, -1, block['stride'], block['stride']]
but now problem is at layer-29 Concat layer mismatch between layer 25 and layer 28 shape below is snippet after I run--> python darknet2caffe.py yolo-voc.cfg yolo-voc.weights yolo-voc.prototxt yolo-voc.caffemodel command
I0208 16:09:48.932379 14450 net.cpp:91] Creating Layer layer23-conv I0208 16:09:48.932384 14450 net.cpp:469] layer23-conv <- layer22-conv I0208 16:09:48.932389 14450 net.cpp:443] layer23-conv -> layer23-conv I0208 16:09:48.935953 14450 net.cpp:141] Setting up layer23-conv I0208 16:09:48.935984 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.935991 14450 net.cpp:156] Memory required for data: 262699008 I0208 16:09:48.936002 14450 layer_factory.hpp:77] Creating layer layer23-bn I0208 16:09:48.936017 14450 net.cpp:91] Creating Layer layer23-bn I0208 16:09:48.936023 14450 net.cpp:469] layer23-bn <- layer23-conv I0208 16:09:48.936031 14450 net.cpp:430] layer23-bn -> layer23-conv (in-place) I0208 16:09:48.936053 14450 net.cpp:141] Setting up layer23-bn I0208 16:09:48.936060 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.936065 14450 net.cpp:156] Memory required for data: 263391232 I0208 16:09:48.936074 14450 layer_factory.hpp:77] Creating layer layer23-scale I0208 16:09:48.936081 14450 net.cpp:91] Creating Layer layer23-scale I0208 16:09:48.936087 14450 net.cpp:469] layer23-scale <- layer23-conv I0208 16:09:48.936094 14450 net.cpp:430] layer23-scale -> layer23-conv (in-place) I0208 16:09:48.936110 14450 layer_factory.hpp:77] Creating layer layer23-scale I0208 16:09:48.936126 14450 net.cpp:141] Setting up layer23-scale I0208 16:09:48.936133 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.936141 14450 net.cpp:156] Memory required for data: 264083456 I0208 16:09:48.936149 14450 layer_factory.hpp:77] Creating layer layer23-act I0208 16:09:48.936157 14450 net.cpp:91] Creating Layer layer23-act I0208 16:09:48.936163 14450 net.cpp:469] layer23-act <- layer23-conv I0208 16:09:48.936168 14450 net.cpp:430] layer23-act -> layer23-conv (in-place) I0208 16:09:48.936175 14450 net.cpp:141] Setting up layer23-act I0208 16:09:48.936182 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.936187 14450 net.cpp:156] Memory required for data: 264775680 I0208 16:09:48.936192 14450 layer_factory.hpp:77] Creating layer layer24-conv I0208 16:09:48.936203 14450 net.cpp:91] Creating Layer layer24-conv I0208 16:09:48.936208 14450 net.cpp:469] layer24-conv <- layer23-conv I0208 16:09:48.936221 14450 net.cpp:443] layer24-conv -> layer24-conv I0208 16:09:48.942919 14450 net.cpp:141] Setting up layer24-conv I0208 16:09:48.942950 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.942955 14450 net.cpp:156] Memory required for data: 265467904 I0208 16:09:48.942966 14450 layer_factory.hpp:77] Creating layer layer24-bn I0208 16:09:48.942979 14450 net.cpp:91] Creating Layer layer24-bn I0208 16:09:48.942986 14450 net.cpp:469] layer24-bn <- layer24-conv I0208 16:09:48.942994 14450 net.cpp:430] layer24-bn -> layer24-conv (in-place) I0208 16:09:48.943013 14450 net.cpp:141] Setting up layer24-bn I0208 16:09:48.943020 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.943025 14450 net.cpp:156] Memory required for data: 266160128 I0208 16:09:48.943034 14450 layer_factory.hpp:77] Creating layer layer24-scale I0208 16:09:48.943044 14450 net.cpp:91] Creating Layer layer24-scale I0208 16:09:48.943049 14450 net.cpp:469] layer24-scale <- layer24-conv I0208 16:09:48.943055 14450 net.cpp:430] layer24-scale -> layer24-conv (in-place) I0208 16:09:48.943069 14450 layer_factory.hpp:77] Creating layer layer24-scale I0208 16:09:48.943086 14450 net.cpp:141] Setting up layer24-scale I0208 16:09:48.943094 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.943099 14450 net.cpp:156] Memory required for data: 266852352 I0208 16:09:48.943104 14450 layer_factory.hpp:77] Creating layer layer24-act I0208 16:09:48.943114 14450 net.cpp:91] Creating Layer layer24-act I0208 16:09:48.943120 14450 net.cpp:469] layer24-act <- layer24-conv I0208 16:09:48.943125 14450 net.cpp:430] layer24-act -> layer24-conv (in-place) I0208 16:09:48.943131 14450 net.cpp:141] Setting up layer24-act I0208 16:09:48.943137 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.943143 14450 net.cpp:156] Memory required for data: 267544576 I0208 16:09:48.943147 14450 layer_factory.hpp:77] Creating layer layer25-conv I0208 16:09:48.943156 14450 net.cpp:91] Creating Layer layer25-conv I0208 16:09:48.943162 14450 net.cpp:469] layer25-conv <- layer24-conv I0208 16:09:48.943169 14450 net.cpp:443] layer25-conv -> layer25-conv I0208 16:09:48.949640 14450 net.cpp:141] Setting up layer25-conv I0208 16:09:48.949666 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.949671 14450 net.cpp:156] Memory required for data: 268236800 I0208 16:09:48.949681 14450 layer_factory.hpp:77] Creating layer layer25-bn I0208 16:09:48.949692 14450 net.cpp:91] Creating Layer layer25-bn I0208 16:09:48.949698 14450 net.cpp:469] layer25-bn <- layer25-conv I0208 16:09:48.949707 14450 net.cpp:430] layer25-bn -> layer25-conv (in-place) I0208 16:09:48.949726 14450 net.cpp:141] Setting up layer25-bn I0208 16:09:48.949733 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.949738 14450 net.cpp:156] Memory required for data: 268929024 I0208 16:09:48.949745 14450 layer_factory.hpp:77] Creating layer layer25-scale I0208 16:09:48.949753 14450 net.cpp:91] Creating Layer layer25-scale I0208 16:09:48.949759 14450 net.cpp:469] layer25-scale <- layer25-conv I0208 16:09:48.949765 14450 net.cpp:430] layer25-scale -> layer25-conv (in-place) I0208 16:09:48.949779 14450 layer_factory.hpp:77] Creating layer layer25-scale I0208 16:09:48.949795 14450 net.cpp:141] Setting up layer25-scale I0208 16:09:48.949802 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.949808 14450 net.cpp:156] Memory required for data: 269621248 I0208 16:09:48.949816 14450 layer_factory.hpp:77] Creating layer layer25-act I0208 16:09:48.949826 14450 net.cpp:91] Creating Layer layer25-act I0208 16:09:48.949831 14450 net.cpp:469] layer25-act <- layer25-conv I0208 16:09:48.949836 14450 net.cpp:430] layer25-act -> layer25-conv (in-place) I0208 16:09:48.949846 14450 net.cpp:141] Setting up layer25-act I0208 16:09:48.949852 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.949856 14450 net.cpp:156] Memory required for data: 270313472 I0208 16:09:48.949862 14450 layer_factory.hpp:77] Creating layer layer27-conv I0208 16:09:48.949870 14450 net.cpp:91] Creating Layer layer27-conv I0208 16:09:48.949875 14450 net.cpp:469] layer27-conv <- layer17-conv_layer17-act_0_split_1 I0208 16:09:48.949882 14450 net.cpp:443] layer27-conv -> layer27-conv I0208 16:09:48.949947 14450 net.cpp:141] Setting up layer27-conv I0208 16:09:48.949955 14450 net.cpp:148] Top shape: 1 64 26 26 (43264) I0208 16:09:48.949961 14450 net.cpp:156] Memory required for data: 270486528 I0208 16:09:48.949968 14450 layer_factory.hpp:77] Creating layer layer27-bn I0208 16:09:48.949975 14450 net.cpp:91] Creating Layer layer27-bn I0208 16:09:48.949982 14450 net.cpp:469] layer27-bn <- layer27-conv I0208 16:09:48.949990 14450 net.cpp:430] layer27-bn -> layer27-conv (in-place) I0208 16:09:48.950004 14450 net.cpp:141] Setting up layer27-bn I0208 16:09:48.950011 14450 net.cpp:148] Top shape: 1 64 26 26 (43264) I0208 16:09:48.950016 14450 net.cpp:156] Memory required for data: 270659584 I0208 16:09:48.950024 14450 layer_factory.hpp:77] Creating layer layer27-scale I0208 16:09:48.950032 14450 net.cpp:91] Creating Layer layer27-scale I0208 16:09:48.950037 14450 net.cpp:469] layer27-scale <- layer27-conv I0208 16:09:48.950044 14450 net.cpp:430] layer27-scale -> layer27-conv (in-place) I0208 16:09:48.950053 14450 layer_factory.hpp:77] Creating layer layer27-scale I0208 16:09:48.950070 14450 net.cpp:141] Setting up layer27-scale I0208 16:09:48.950078 14450 net.cpp:148] Top shape: 1 64 26 26 (43264) I0208 16:09:48.950083 14450 net.cpp:156] Memory required for data: 270832640 I0208 16:09:48.950090 14450 layer_factory.hpp:77] Creating layer layer27-act I0208 16:09:48.950098 14450 net.cpp:91] Creating Layer layer27-act I0208 16:09:48.950103 14450 net.cpp:469] layer27-act <- layer27-conv I0208 16:09:48.950109 14450 net.cpp:430] layer27-act -> layer27-conv (in-place) I0208 16:09:48.950115 14450 net.cpp:141] Setting up layer27-act I0208 16:09:48.950121 14450 net.cpp:148] Top shape: 1 64 26 26 (43264) I0208 16:09:48.950127 14450 net.cpp:156] Memory required for data: 271005696 I0208 16:09:48.950130 14450 layer_factory.hpp:77] Creating layer layer28-reshape I0208 16:09:48.950140 14450 net.cpp:91] Creating Layer layer28-reshape I0208 16:09:48.950145 14450 net.cpp:469] layer28-reshape <- layer27-conv I0208 16:09:48.950151 14450 net.cpp:443] layer28-reshape -> layer28-reshape I0208 16:09:48.950160 14450 net.cpp:141] Setting up layer28-reshape I0208 16:09:48.950168 14450 net.cpp:148] Top shape: 1 64 26 26 (43264) I0208 16:09:48.950172 14450 net.cpp:156] Memory required for data: 271178752 I0208 16:09:48.950177 14450 layer_factory.hpp:77] Creating layer layer29-concat I0208 16:09:48.950183 14450 net.cpp:91] Creating Layer layer29-concat I0208 16:09:48.950188 14450 net.cpp:469] layer29-concat <- layer28-reshape I0208 16:09:48.950192 14450 net.cpp:469] layer29-concat <- layer25-conv I0208 16:09:48.950199 14450 net.cpp:443] layer29-concat -> layer29-concat F0208 16:09:48.950209 14450 concat_layer.cpp:42] Check failed: top_shape[j] == bottom[i]->shape(j) (26 vs. 13) All inputs must have the same shape, except at concat_axis. Check failure stack trace: Aborted (core dumped)
@YogeshShitole Can you give execution log from darknet? Besides, I implemented auto shape infer for reorg layer here.
Hi @ysh329 here is my darknet2caffe_convert.log
block:OrderedDict([('type', 'route'), ('layers', '-9')]) block[type]: reorg block[stride]: 2 ============== reorg ========= reshape['top']: layer28-reshape layer_id: 28 bottom: layer27-conv block:OrderedDict([('type', 'route'), ('layers', '-1,-4')]) from_layer: ['-1', '-4'] prev_layer_id1: 28 prev_layer_id2: 25 layer_id: 29 concat_layer: OrderedDict([('name', 'layer29-concat'), ('type', 'Concat'), ('bottom', ['layer28-reshape', 'layer25-conv']), ('top', 'layer29-concat')])
I also tried auto shape infer implementation of yours which is also not working with yolo-voc.cfg and yolo-voc.weights
@YogeshShitole I mean Darknet's execution log, which is like below:
layer filters size input output
0 conv 32 3 x 3 / 1 288 x 288 x 3 -> 288 x 288 x 32
1 max 2 x 2 / 2 288 x 288 x 32 -> 144 x 144 x 32
2 conv 64 3 x 3 / 1 144 x 144 x 32 -> 144 x 144 x 64
3 max 2 x 2 / 2 144 x 144 x 64 -> 72 x 72 x 64
4 conv 128 3 x 3 / 1 72 x 72 x 64 -> 72 x 72 x 128
5 conv 64 1 x 1 / 1 72 x 72 x 128 -> 72 x 72 x 64
6 conv 128 3 x 3 / 1 72 x 72 x 64 -> 72 x 72 x 128
7 max 2 x 2 / 2 72 x 72 x 128 -> 36 x 36 x 128
8 conv 256 3 x 3 / 1 36 x 36 x 128 -> 36 x 36 x 256
9 conv 128 1 x 1 / 1 36 x 36 x 256 -> 36 x 36 x 128
10 conv 256 3 x 3 / 1 36 x 36 x 128 -> 36 x 36 x 256
11 max 2 x 2 / 2 36 x 36 x 256 -> 18 x 18 x 256
12 conv 512 3 x 3 / 1 18 x 18 x 256 -> 18 x 18 x 512
13 conv 256 1 x 1 / 1 18 x 18 x 512 -> 18 x 18 x 256
14 conv 512 3 x 3 / 1 18 x 18 x 256 -> 18 x 18 x 512
15 conv 256 1 x 1 / 1 18 x 18 x 512 -> 18 x 18 x 256
16 conv 512 3 x 3 / 1 18 x 18 x 256 -> 18 x 18 x 512
17 max 2 x 2 / 2 18 x 18 x 512 -> 9 x 9 x 512
18 conv 1024 3 x 3 / 1 9 x 9 x 512 -> 9 x 9 x1024
19 conv 512 1 x 1 / 1 9 x 9 x1024 -> 9 x 9 x 512
20 conv 1024 3 x 3 / 1 9 x 9 x 512 -> 9 x 9 x1024
21 conv 512 1 x 1 / 1 9 x 9 x1024 -> 9 x 9 x 512
22 conv 1024 3 x 3 / 1 9 x 9 x 512 -> 9 x 9 x1024
23 conv 1024 3 x 3 / 1 9 x 9 x1024 -> 9 x 9 x1024
24 conv 1024 3 x 3 / 1 9 x 9 x1024 -> 9 x 9 x1024
25 route 16
26 reorg / 2 18 x 18 x 512 -> 9 x 9 x2048
27 route 26 24
28 conv 1024 3 x 3 / 1 9 x 9 x3072 -> 9 x 9 x1024
29 conv 75 1 x 1 / 1 9 x 9 x1024 -> 9 x 9 x 75
30 detection
Or you have same execution log as mine above?
tiny-yolov2
layer filters size input output
0 conv 16 3 x 3 / 1 640 x 480 x 3 -> 640 x 480 x 16
1 max 2 x 2 / 2 640 x 480 x 16 -> 320 x 240 x 16
2 conv 32 3 x 3 / 1 320 x 240 x 16 -> 320 x 240 x 32
3 max 2 x 2 / 2 320 x 240 x 32 -> 160 x 120 x 32
4 conv 64 3 x 3 / 1 160 x 120 x 32 -> 160 x 120 x 64
5 max 2 x 2 / 2 160 x 120 x 64 -> 80 x 60 x 64
6 conv 128 3 x 3 / 1 80 x 60 x 64 -> 80 x 60 x 128
7 max 2 x 2 / 2 80 x 60 x 128 -> 40 x 30 x 128
8 conv 256 3 x 3 / 1 40 x 30 x 128 -> 40 x 30 x 256
9 max 2 x 2 / 2 40 x 30 x 256 -> 20 x 15 x 256
10 conv 512 3 x 3 / 1 20 x 15 x 256 -> 20 x 15 x 512
11 max 2 x 2 / 1 20 x 15 x 512 -> 20 x 15 x 512
12 conv 1024 3 x 3 / 1 20 x 15 x 512 -> 20 x 15 x1024
13 route 8
14 conv 64 1 x 1 / 1 40 x 30 x 256 -> 40 x 30 x 64
15 reorg / 2 40 x 30 x 64 -> 20 x 15 x 256
16 route 15 12
17 conv 1024 3 x 3 / 1 20 x 15 x1280 -> 20 x 15 x1024
18 conv 35 1 x 1 / 1 20 x 15 x1024 -> 20 x 15 x 35
19 detection
tiny-yolo-125
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
13 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
14 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125
15 detection
@ysh329 with darknet it is executing perfectly below is darknet execution log
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64
6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
25 route 16
26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64
27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024
30 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125
31 detection
``
@YogeshShitole I found this bug caused by auto shape infer of reorg layer in my code :rofl: . I'm fixing this bug now. :rofl:
@ysh329 Thank you 😊
@YogeshShitole hey, big guy. I fixed bug in this branch. Feel free to have a try. :rofl:
layer {
name: "data"
type: "Input"
top: "data"
input_param {
shape {
dim: 1
dim: 3
dim: 288
dim: 288
}
}
}
The third dim is height of input.
@YogeshShitole, you said it is working for you with tiny-yolo-voc.cfg and tiny-yolo-voc.weights using darknet2caffe.py.
But if i run it python3 darknet2caffe.py ../darknet/cfg/tiny-yolo-voc.cfg ../darknet/weights/tiny-yolo-voc.weights
Gives error [libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 14:14: Expected integer. WARNING: Logging before InitGoogleLogging() is written to STDERR F0322 03:10:21.651065 22037 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: ../darknet/cfg/tiny-yolo-voc.prototxt Check failure stack trace: Aborted
Please help me to solve it. As far as i know , only you guys have come up with a working model using reorg layers in caffe.
@nixnmtm try docker image of caffe-cpu
Ok, Thank you. It is working now.
Hi,
When I tried to convert yolov2 to caffe version, I found the results of concat and route are different.
concat looks like concatenate two flatten blobs.
route looks another way which I don't know exactly.
And the final results is totally wrong.
Please help me, thanks!
@s5plus1 Hi, route is a connection way, similar to concat
, route
's value means: Starting from the current layer, it is connected to the countdown n-th layer of the current layer.
@ysh329 Thanks for your quick reply.
After debugging, I found the the difference is caused by reorg layer.
The results of reorg in darknet and reshape in caffe are different.
I tried to flatten the reorg layer output and the reshape layer output, and compared.
It seems that reorg is not equal to reshape? (correct me if I'm wrong pls) ...
@s5plus1 Yeah, they're different. The impl. of reorg you need to refer darknet's and it's really curious operation. Darknet's impl. of reorg is clearly but confusing and you can copy its codes to your layers in caffe.
@ysh329 Got it! Thanks again!
I tried pytorch-caffe-darknet-convert but failed due to lack of
reorg
layer and issue ofconcate
layer support (of course, I created an issue to ask but no one replied me). Next I think three available solutions below:reorg
layer and fix the issue ofconcate
layer in pytorch-caffe-darknet-convert