ysh329 commented 6 years ago

I tried pytorch-caffe-darknet-convert but failed due to lack of reorg layer and issue of concate layer support (of course, I created an issue to ask but no one replied me). Next I think three available solutions below:

try to write code supporting reorg layer and fix the issue of concate layer in pytorch-caffe-darknet-convert
try to convert darknet-yolov2 to tensorflow first (darkflow gives concrete command about how to translate) such as darkflow and then to caffe next (use MMdnn)
try another converter from darknet to caffe, such as gklz1982/caffe-yolov2

ysh329 commented 6 years ago

pytorch-caffe-darkenet-convert

Of course, I think it's necessary to check (print) network architecture (model config) using darknet running.

root@cde4ed5721f1:~/darknet# ./darknet detect yolo-voc.2.0.cfg yolo-voc_final.weights data/dog.jpg                                                                                                                                                           [6/1763]
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   288 x 288 x   3   ->   288 x 288 x  32
    1 max          2 x 2 / 2   288 x 288 x  32   ->   144 x 144 x  32
    2 conv     64  3 x 3 / 1   144 x 144 x  32   ->   144 x 144 x  64
    3 max          2 x 2 / 2   144 x 144 x  64   ->    72 x  72 x  64
    4 conv    128  3 x 3 / 1    72 x  72 x  64   ->    72 x  72 x 128
    5 conv     64  1 x 1 / 1    72 x  72 x 128   ->    72 x  72 x  64
    6 conv    128  3 x 3 / 1    72 x  72 x  64   ->    72 x  72 x 128
    7 max          2 x 2 / 2    72 x  72 x 128   ->    36 x  36 x 128
    8 conv    256  3 x 3 / 1    36 x  36 x 128   ->    36 x  36 x 256
    9 conv    128  1 x 1 / 1    36 x  36 x 256   ->    36 x  36 x 128
   10 conv    256  3 x 3 / 1    36 x  36 x 128   ->    36 x  36 x 256
   11 max          2 x 2 / 2    36 x  36 x 256   ->    18 x  18 x 256
   12 conv    512  3 x 3 / 1    18 x  18 x 256   ->    18 x  18 x 512
   13 conv    256  1 x 1 / 1    18 x  18 x 512   ->    18 x  18 x 256
   14 conv    512  3 x 3 / 1    18 x  18 x 256   ->    18 x  18 x 512
   15 conv    256  1 x 1 / 1    18 x  18 x 512   ->    18 x  18 x 256
   16 conv    512  3 x 3 / 1    18 x  18 x 256   ->    18 x  18 x 512
   17 max          2 x 2 / 2    18 x  18 x 512   ->     9 x   9 x 512
   18 conv   1024  3 x 3 / 1     9 x   9 x 512   ->     9 x   9 x1024
   19 conv    512  1 x 1 / 1     9 x   9 x1024   ->     9 x   9 x 512
   20 conv   1024  3 x 3 / 1     9 x   9 x 512   ->     9 x   9 x1024
   21 conv    512  1 x 1 / 1     9 x   9 x1024   ->     9 x   9 x 512
   22 conv   1024  3 x 3 / 1     9 x   9 x 512   ->     9 x   9 x1024
   23 conv   1024  3 x 3 / 1     9 x   9 x1024   ->     9 x   9 x1024
   24 conv   1024  3 x 3 / 1     9 x   9 x1024   ->     9 x   9 x1024
   25 route  16
   26 reorg              / 2    18 x  18 x 512   ->     9 x   9 x2048
   27 route  26 24
   28 conv   1024  3 x 3 / 1     9 x   9 x3072   ->     9 x   9 x1024
   29 conv     75  1 x 1 / 1     9 x   9 x1024   ->     9 x   9 x  75
   30 detection
mask_scale: Using default '1.000000'
Loading weights from yolo-voc_final.weights...Done!
data/dog.jpg: Predicted in 3.688960 seconds.

The problem of pytorch-caffe-darknet-convert caused by the reorg layer doesn't work as below:

root@c5c10ada8bdb:/home/yuanshuai/code/pytorch-caffe-darknet-convert# python darknet2caffe.py ./models/yolo-voc.2.0.cfg ./models/yolo-voc_final.weights ./models/yolo-voc.2.0.prototxt ./models/yolo-voc_final.caffemodel
unknow layer type reorg 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0115 12:06:52.210939   711 upgrade_proto.cpp:67] Attempting to upgrade input file specified using deprecated input fields: ./models/yolo-voc.2.0.prototxt
I0115 12:06:52.210985   711 upgrade_proto.cpp:70] Successfully upgraded file specified using deprecated input fields.
W0115 12:06:52.210990   711 upgrade_proto.cpp:72] Note that future Caffe releases will only support input layers and not input fields.
I0115 12:06:52.211549   711 net.cpp:51] Initializing net from parameters: 
name: "Darkent2Caffe"
.
.
.
.
.
.
I0115 09:30:08.703138   594 net.cpp:406] layer28-concat <- layer17-conv_layer17-act_0_split_1
I0115 09:30:08.703145   594 net.cpp:406] layer28-concat <- layer25-conv
I0115 09:30:08.703155   594 net.cpp:380] layer28-concat -> layer28-concat
F0115 09:30:08.703176   594 concat_layer.cpp:42] Check failed: top_shape[j] == bottom[i]->shape(j) (18 vs. 9) All inputs must have the same shape, except at concat_axis.
*** Check failure stack trace: ***
Aborted (core dumped

Ps: before this issus, I fixed another problem caused by concat layer name: I changed original code to concat_layer['type'] = 'Concat' # modified by ysh 'concat' to 'Concat' in darknet2caffe.py:211.

Check the execution log from darknet, we can find a reorg layer (#25), reshaping the layer (#16) from feature map 18x18x512 to 9x9x2048. This's a key step for concating --- an important prerequisite is dimension (width and height) of feature map should be same.

Of course, we can find an important execution log: unknow layer type reorg (first line of execution log).

Pictures above are outputs from two branch before concat. Due to lack of real convert implementation of reorg layer (reshape layer in caffe), two outputs can not concate using same height and width.

Thus, current objects are reshape (reorg in darknet) and concate layers. It's significant to try to fix/support/check reshape (reorg) layer implementation in pytorch-caffe-darknet-convert first.

ysh329 commented 6 years ago

Same issue pytorch-caffe-darknet-convert/issues/24.

Now I read out the value of reorg layer in darknet config file as below:

block['type']:reorg
type(block):<class 'collections.OrderedDict'>
block:OrderedDict([('type', 'reorg'), ('stride', '2')])
block[type]: reorg
block[stride]: 2

OrderedDict is a child type from dict. The unique difference between them is dict is out of order but OrderedDict is in (its insertion) order.

Next is find the responding rule from darknet to caffe. Thus, it's necessary to clear understand the meaning in darknet config file and params. meaning in responding reshape layer in caffe.

ysh329 commented 6 years ago

Content below is from caffe/reshape.md at master · BVLC/caffe:

title: Reshape Layer

Reshape Layer

Layer type: Reshape
Doxygen Documentation
Header: ./include/caffe/layers/reshape_layer.hpp
Implementation: ./src/caffe/layers/reshape_layer.cpp
Input
- a single blob with arbitrary dimensions
Output
- the same blob, with modified dimensions, as specified by reshape_param

Sample

layer {
  name: "reshape"
  type: "Reshape"
  bottom: "input"
  top: "output"
  reshape_param {
    shape {
      dim: 0  # copy the dimension from below
      dim: 2
      dim: 3
      dim: -1 # infer it from the other dimensions
    }
  }
}

The Reshape layer can be used to change the dimensions of its input, without changing its data. Just like the Flatten layer, only the dimensions are changed; no data is copied in the process.

Output dimensions are specified by the ReshapeParam proto. Positive numbers are used directly, setting the corresponding dimension of the output blob. In addition, two special values are accepted for any of the target dimension values:

0 means "copy the respective dimension of the bottom layer". That is, if the bottom has 2 as its 1st dimension, the top will have 2 as its 1st dimension as well, given dim: 0 as the 1st target dimension.
-1 stands for "infer this from the other dimensions". This behavior is similar to that of -1 in numpy's or [] for MATLAB's reshape: this dimension is calculated to keep the overall element count the same as in the bottom layer. At most one -1 can be used in a reshape operation.

As another example, specifying reshape_param { shape { dim: 0 dim: -1 } } makes the layer behave in exactly the same way as the Flatten layer.

Parameters

Parameters (ReshapeParameter reshape_param)
- Optional: (also see detailed description below)
  - shape
From ./src/caffe/proto/caffe.proto:

{% highlight Protobuf %} {% include proto/ReshapeParameter.txt %} {% endhighlight %}

more in caffe.proto

message ReshapeParameter {
  // Specify the output dimensions. If some of the dimensions are set to 0,
  // the corresponding dimension from the bottom layer is used (unchanged).
  // Exactly one dimension may be set to -1, in which case its value is
  // inferred from the count of the bottom blob and the remaining dimensions.
  // For example, suppose we want to reshape a 2D blob "input" with shape 2 x 8:
  //
  //   layer {
  //     type: "Reshape" bottom: "input" top: "output"
  //     reshape_param { ... }
  //   }
  //
  // If "input" is 2D with shape 2 x 8, then the following reshape_param
  // specifications are all equivalent, producing a 3D blob "output" with shape
  // 2 x 2 x 4:
  //
  //   reshape_param { shape { dim:  2  dim: 2  dim:  4 } }
  //   reshape_param { shape { dim:  0  dim: 2  dim:  4 } }
  //   reshape_param { shape { dim:  0  dim: 2  dim: -1 } }
  //   reshape_param { shape { dim:  0  dim:-1  dim:  4 } }
  //
  optional BlobShape shape = 1;

  // axis and num_axes control the portion of the bottom blob's shape that are
  // replaced by (included in) the reshape. By default (axis == 0 and
  // num_axes == -1), the entire bottom blob shape is included in the reshape,
  // and hence the shape field must specify the entire output shape.
  //
  // axis may be non-zero to retain some portion of the beginning of the input
  // shape (and may be negative to index from the end; e.g., -1 to begin the
  // reshape after the last axis, including nothing in the reshape,
  // -2 to include only the last axis, etc.).
  //
  // For example, suppose "input" is a 2D blob with shape 2 x 8.
  // Then the following ReshapeLayer specifications are all equivalent,
  // producing a blob "output" with shape 2 x 2 x 4:
  //
  //   reshape_param { shape { dim: 2  dim: 2  dim: 4 } }
  //   reshape_param { shape { dim: 2  dim: 4 } axis:  1 }
  //   reshape_param { shape { dim: 2  dim: 4 } axis: -3 }
  //
  // num_axes specifies the extent of the reshape.
  // If num_axes >= 0 (and axis >= 0), the reshape will be performed only on
  // input axes in the range [axis, axis+num_axes].
  // num_axes may also be -1, the default, to include all remaining axes
  // (starting from axis).
  //
  // For example, suppose "input" is a 2D blob with shape 2 x 8.
  // Then the following ReshapeLayer specifications are equivalent,
  // producing a blob "output" with shape 1 x 2 x 8.
  //
  //   reshape_param { shape { dim:  1  dim: 2  dim:  8 } }
  //   reshape_param { shape { dim:  1  dim: 2  }  num_axes: 1 }
  //   reshape_param { shape { dim:  1  }  num_axes: 0 }
  //
  // On the other hand, these would produce output blob shape 2 x 1 x 8:
  //
  //   reshape_param { shape { dim: 2  dim: 1  dim: 8  }  }
  //   reshape_param { shape { dim: 1 }  axis: 1  num_axes: 0 }
  //
  optional int32 axis = 2 [default = 0];
  optional int32 num_axes = 3 [default = -1];
}

ysh329 commented 6 years ago

Now I have finished the conversion from darknet model to caffe's using pytorch-caffe-darknet-convert after adding the support of reorg layer and fixing some issues.

However, the support of reorg layer has a small problem: you should define the output dimension of reorg layer. More detailed info. can be found in code darknet2caffe.py of repo. pytorch-caffe-darknet-convert.

maxadda commented 6 years ago

Thank you very much for your contribution，I use your code to normally generate yolov2 network files and models,But in the test, the first layer of batchnorm output Nan. Do you have a test?

ysh329 commented 6 years ago

@maxadda I checked the correctness of conversion, caffemodel (conv params etc.) and prototxt generated by darknet2caffe.py. The conversion is okay but I found some problems about computation of feature map: the results from darknet and caffe are different.

More concretely, I checked the last convolutional result (feature map) and found their feature maps're totally different, not only values but also order of magnitude (darknet's value're all between 0 and 1, but caffe's are big 🤣 such as +-10 or bigger ).

Thus, I'm checking the feature map layer by layer from data input now. bless me🤣

Besides, I found there's a key point: In darknet, the batchnorm is one param of convolutional layer in cfg file and if this layer has batchnorm then this layer will not have (convolutional) bias parameters (I found this point from darknet source code).

ysh329 commented 6 years ago

I found a series of pre-processing in darknet. According its process order, I list as below:

read image process using OpenCV or self-implementation (assume OpenCV)
convert image to float type, meanwhile normalize value (each pixel value is divided by 255), save from HWC to CHW
change RGB to BGR
resize image with short side (equal scaling scale)
create a new blank image with model input shape and fill with 0.5, then embed resized image (step4) using equal-scaling-scaled to new image

Besides, reorg layer of darknet has different implementation with caffe's Reshape. reorg layer of darknet is really odd. Its code is below:

void reorg_cpu(float *x, int w, int h, int c, int batch, int stride, int forward, float *out)
{
    int b,i,j,k;
    int out_c = c/(stride*stride);

    for(b = 0; b < batch; ++b){
        for(k = 0; k < c; ++k){
            for(j = 0; j < h; ++j){
                for(i = 0; i < w; ++i){
                    int in_index  = i + w*(j + h*(k + c*b));
                    int c2 = k % out_c;
                    int offset = k / out_c;
                    int w2 = i*stride + offset % stride;
                    int h2 = j*stride + offset / stride;
                    int out_index = w2 + w*stride*(h2 + h*stride*(c2 + out_c*b));
                    if(forward) out[out_index] = x[in_index];
                    else out[in_index] = x[out_index];
                }
            }
        }
    }
}

@maxadda

ysh329 commented 6 years ago

Currently, I found it seems exists some parameters (more concretely, they're regression box's biases parameter, note: not convolutional biases) in region layer (in src/region_layer.c of darknet) of model file but I'm not sure if these biases parameters in model file (weight file) or not, which need to explore darknet code deeper. Afterwards, I found the biases parameters of region layer is anchor values in cfgfile. In other words, the bias parameters are not in model file but cfg file.

But lately I found these codes from darknet, it seems this code is used to read biases parameter of detection box regression (Of course, I'm sure here is read bias from cfg file):

    char *a = option_find_str(options, "anchors", 0);
    if(a){
        int len = strlen(a);
        int n = 1; 
        int i;
        for(i = 0; i < len; ++i){
            if (a[i] == ',') ++n;
        }
        fprintf(stderr, "==== parse_region ====\n");
        for(i = 0; i < n; ++i){
            float bias = atof(a);
            fprintf(stderr, "%d\t%f\n", i, bias);
            l.biases[i] = bias;
            a = strchr(a, ',')+1;
        }
    }

Besides, by reading code, I found whether this region layer has parameter depends on the training setting. Code below is from function layer parse_region(list *options, size_params params) in parser.c of darknet.

ysh329 commented 6 years ago

I'm running an another model named tiny-yolov-voc as below:

$ ./darknet detect tiny-yolo-voc.cfg tiny-yolo-voc_final.weights data/warship.jpg
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   13 conv   1024  3 x 3 / 1    13 x  13 x1024   ->    13 x  13 x1024
   14 conv     75  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x  75
   15 detection
mask_scale: Using default '1.000000'

Besides, I ran another tiny YOLO as below:

$ ./darknet detector test cfg/voc.data cfg/tiny-yolo-voc.cfg tiny-yolo-voc.weights data/dog.jpg
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   13 conv   1024  3 x 3 / 1    13 x  13 x1024   ->    13 x  13 x1024
   14 conv    125  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 125
   15 detection
mask_scale: Using default '1.000000'
Loading weights from tiny-yolo-voc.weights...Done!
data/dog.jpg: Predicted in 8.385642 seconds.
car: 35%
car: 55%
dog: 78%
bicycle: 36%

We can find the important difference is last pooling, whose stride=1. It's very particular.

YogeshShitole commented 6 years ago

@ysh329 First of all thanks for conversion tool I used your code to convert tiny-yolo-voc.cfg and tiny-yolo-voc.weights to tiny-yolo-voc.prototxt and tiny-yolo-voc.caffemodel and its working fine but when I am trying to convert using yolo-voc.cfg and yolo-voc.weights file to get yolo-voc.caffemodel it is not working and throwing below error

I0206 18:56:52.413803 23711 net.cpp:443] layer28-reshape -> layer28-reshape F0206 18:56:52.413822 23711 reshape_layer.cpp:87] Check failed: top[0]->count() == bottom[0]->count() (165888 vs. 43264) output count must match input count Check failure stack trace

Cfg file I used from here https://github.com/pjreddie/darknet/blob/master/cfg/yolo-voc.cfg and weight file from : https://pjreddie.com/media/files/yolo-voc.weights I would really appreciate any kind of help on this Thank you

ysh329 commented 6 years ago

@YogeshShitole Do you redefine the output dimension of reorg layer as below from this code?

            # TODO: auto shape infer
            shape['dim'] = [1, 2048, 9, 9]

YogeshShitole commented 6 years ago

Hi @ysh329 Thanks for quick reply I redefined output dimension of reorg layer from this

TODO: auto shape infer

        shape['dim'] = [1, 2048, 9, 9]

to

TODO: auto shape infer

        shape['dim'] = [1, 64, 26, 26]

and also tried auto shape infer option shape['dim'] = [1, -1, block['stride'], block['stride']]

but now problem is at layer-29 Concat layer mismatch between layer 25 and layer 28 shape below is snippet after I run--> python darknet2caffe.py yolo-voc.cfg yolo-voc.weights yolo-voc.prototxt yolo-voc.caffemodel command

I0208 16:09:48.932379 14450 net.cpp:91] Creating Layer layer23-conv I0208 16:09:48.932384 14450 net.cpp:469] layer23-conv <- layer22-conv I0208 16:09:48.932389 14450 net.cpp:443] layer23-conv -> layer23-conv I0208 16:09:48.935953 14450 net.cpp:141] Setting up layer23-conv I0208 16:09:48.935984 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.935991 14450 net.cpp:156] Memory required for data: 262699008 I0208 16:09:48.936002 14450 layer_factory.hpp:77] Creating layer layer23-bn I0208 16:09:48.936017 14450 net.cpp:91] Creating Layer layer23-bn I0208 16:09:48.936023 14450 net.cpp:469] layer23-bn <- layer23-conv I0208 16:09:48.936031 14450 net.cpp:430] layer23-bn -> layer23-conv (in-place) I0208 16:09:48.936053 14450 net.cpp:141] Setting up layer23-bn I0208 16:09:48.936060 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.936065 14450 net.cpp:156] Memory required for data: 263391232 I0208 16:09:48.936074 14450 layer_factory.hpp:77] Creating layer layer23-scale I0208 16:09:48.936081 14450 net.cpp:91] Creating Layer layer23-scale I0208 16:09:48.936087 14450 net.cpp:469] layer23-scale <- layer23-conv I0208 16:09:48.936094 14450 net.cpp:430] layer23-scale -> layer23-conv (in-place) I0208 16:09:48.936110 14450 layer_factory.hpp:77] Creating layer layer23-scale I0208 16:09:48.936126 14450 net.cpp:141] Setting up layer23-scale I0208 16:09:48.936133 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.936141 14450 net.cpp:156] Memory required for data: 264083456 I0208 16:09:48.936149 14450 layer_factory.hpp:77] Creating layer layer23-act I0208 16:09:48.936157 14450 net.cpp:91] Creating Layer layer23-act I0208 16:09:48.936163 14450 net.cpp:469] layer23-act <- layer23-conv I0208 16:09:48.936168 14450 net.cpp:430] layer23-act -> layer23-conv (in-place) I0208 16:09:48.936175 14450 net.cpp:141] Setting up layer23-act I0208 16:09:48.936182 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.936187 14450 net.cpp:156] Memory required for data: 264775680 I0208 16:09:48.936192 14450 layer_factory.hpp:77] Creating layer layer24-conv I0208 16:09:48.936203 14450 net.cpp:91] Creating Layer layer24-conv I0208 16:09:48.936208 14450 net.cpp:469] layer24-conv <- layer23-conv I0208 16:09:48.936221 14450 net.cpp:443] layer24-conv -> layer24-conv I0208 16:09:48.942919 14450 net.cpp:141] Setting up layer24-conv I0208 16:09:48.942950 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.942955 14450 net.cpp:156] Memory required for data: 265467904 I0208 16:09:48.942966 14450 layer_factory.hpp:77] Creating layer layer24-bn I0208 16:09:48.942979 14450 net.cpp:91] Creating Layer layer24-bn I0208 16:09:48.942986 14450 net.cpp:469] layer24-bn <- layer24-conv I0208 16:09:48.942994 14450 net.cpp:430] layer24-bn -> layer24-conv (in-place) I0208 16:09:48.943013 14450 net.cpp:141] Setting up layer24-bn I0208 16:09:48.943020 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.943025 14450 net.cpp:156] Memory required for data: 266160128 I0208 16:09:48.943034 14450 layer_factory.hpp:77] Creating layer layer24-scale I0208 16:09:48.943044 14450 net.cpp:91] Creating Layer layer24-scale I0208 16:09:48.943049 14450 net.cpp:469] layer24-scale <- layer24-conv I0208 16:09:48.943055 14450 net.cpp:430] layer24-scale -> layer24-conv (in-place) I0208 16:09:48.943069 14450 layer_factory.hpp:77] Creating layer layer24-scale I0208 16:09:48.943086 14450 net.cpp:141] Setting up layer24-scale I0208 16:09:48.943094 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.943099 14450 net.cpp:156] Memory required for data: 266852352 I0208 16:09:48.943104 14450 layer_factory.hpp:77] Creating layer layer24-act I0208 16:09:48.943114 14450 net.cpp:91] Creating Layer layer24-act I0208 16:09:48.943120 14450 net.cpp:469] layer24-act <- layer24-conv I0208 16:09:48.943125 14450 net.cpp:430] layer24-act -> layer24-conv (in-place) I0208 16:09:48.943131 14450 net.cpp:141] Setting up layer24-act I0208 16:09:48.943137 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.943143 14450 net.cpp:156] Memory required for data: 267544576 I0208 16:09:48.943147 14450 layer_factory.hpp:77] Creating layer layer25-conv I0208 16:09:48.943156 14450 net.cpp:91] Creating Layer layer25-conv I0208 16:09:48.943162 14450 net.cpp:469] layer25-conv <- layer24-conv I0208 16:09:48.943169 14450 net.cpp:443] layer25-conv -> layer25-conv I0208 16:09:48.949640 14450 net.cpp:141] Setting up layer25-conv I0208 16:09:48.949666 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.949671 14450 net.cpp:156] Memory required for data: 268236800 I0208 16:09:48.949681 14450 layer_factory.hpp:77] Creating layer layer25-bn I0208 16:09:48.949692 14450 net.cpp:91] Creating Layer layer25-bn I0208 16:09:48.949698 14450 net.cpp:469] layer25-bn <- layer25-conv I0208 16:09:48.949707 14450 net.cpp:430] layer25-bn -> layer25-conv (in-place) I0208 16:09:48.949726 14450 net.cpp:141] Setting up layer25-bn I0208 16:09:48.949733 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.949738 14450 net.cpp:156] Memory required for data: 268929024 I0208 16:09:48.949745 14450 layer_factory.hpp:77] Creating layer layer25-scale I0208 16:09:48.949753 14450 net.cpp:91] Creating Layer layer25-scale I0208 16:09:48.949759 14450 net.cpp:469] layer25-scale <- layer25-conv I0208 16:09:48.949765 14450 net.cpp:430] layer25-scale -> layer25-conv (in-place) I0208 16:09:48.949779 14450 layer_factory.hpp:77] Creating layer layer25-scale I0208 16:09:48.949795 14450 net.cpp:141] Setting up layer25-scale I0208 16:09:48.949802 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.949808 14450 net.cpp:156] Memory required for data: 269621248 I0208 16:09:48.949816 14450 layer_factory.hpp:77] Creating layer layer25-act I0208 16:09:48.949826 14450 net.cpp:91] Creating Layer layer25-act I0208 16:09:48.949831 14450 net.cpp:469] layer25-act <- layer25-conv I0208 16:09:48.949836 14450 net.cpp:430] layer25-act -> layer25-conv (in-place) I0208 16:09:48.949846 14450 net.cpp:141] Setting up layer25-act I0208 16:09:48.949852 14450 net.cpp:148] Top shape: 1 1024 13 13 (173056) I0208 16:09:48.949856 14450 net.cpp:156] Memory required for data: 270313472 I0208 16:09:48.949862 14450 layer_factory.hpp:77] Creating layer layer27-conv I0208 16:09:48.949870 14450 net.cpp:91] Creating Layer layer27-conv I0208 16:09:48.949875 14450 net.cpp:469] layer27-conv <- layer17-conv_layer17-act_0_split_1 I0208 16:09:48.949882 14450 net.cpp:443] layer27-conv -> layer27-conv I0208 16:09:48.949947 14450 net.cpp:141] Setting up layer27-conv I0208 16:09:48.949955 14450 net.cpp:148] Top shape: 1 64 26 26 (43264) I0208 16:09:48.949961 14450 net.cpp:156] Memory required for data: 270486528 I0208 16:09:48.949968 14450 layer_factory.hpp:77] Creating layer layer27-bn I0208 16:09:48.949975 14450 net.cpp:91] Creating Layer layer27-bn I0208 16:09:48.949982 14450 net.cpp:469] layer27-bn <- layer27-conv I0208 16:09:48.949990 14450 net.cpp:430] layer27-bn -> layer27-conv (in-place) I0208 16:09:48.950004 14450 net.cpp:141] Setting up layer27-bn I0208 16:09:48.950011 14450 net.cpp:148] Top shape: 1 64 26 26 (43264) I0208 16:09:48.950016 14450 net.cpp:156] Memory required for data: 270659584 I0208 16:09:48.950024 14450 layer_factory.hpp:77] Creating layer layer27-scale I0208 16:09:48.950032 14450 net.cpp:91] Creating Layer layer27-scale I0208 16:09:48.950037 14450 net.cpp:469] layer27-scale <- layer27-conv I0208 16:09:48.950044 14450 net.cpp:430] layer27-scale -> layer27-conv (in-place) I0208 16:09:48.950053 14450 layer_factory.hpp:77] Creating layer layer27-scale I0208 16:09:48.950070 14450 net.cpp:141] Setting up layer27-scale I0208 16:09:48.950078 14450 net.cpp:148] Top shape: 1 64 26 26 (43264) I0208 16:09:48.950083 14450 net.cpp:156] Memory required for data: 270832640 I0208 16:09:48.950090 14450 layer_factory.hpp:77] Creating layer layer27-act I0208 16:09:48.950098 14450 net.cpp:91] Creating Layer layer27-act I0208 16:09:48.950103 14450 net.cpp:469] layer27-act <- layer27-conv I0208 16:09:48.950109 14450 net.cpp:430] layer27-act -> layer27-conv (in-place) I0208 16:09:48.950115 14450 net.cpp:141] Setting up layer27-act I0208 16:09:48.950121 14450 net.cpp:148] Top shape: 1 64 26 26 (43264) I0208 16:09:48.950127 14450 net.cpp:156] Memory required for data: 271005696 I0208 16:09:48.950130 14450 layer_factory.hpp:77] Creating layer layer28-reshape I0208 16:09:48.950140 14450 net.cpp:91] Creating Layer layer28-reshape I0208 16:09:48.950145 14450 net.cpp:469] layer28-reshape <- layer27-conv I0208 16:09:48.950151 14450 net.cpp:443] layer28-reshape -> layer28-reshape I0208 16:09:48.950160 14450 net.cpp:141] Setting up layer28-reshape I0208 16:09:48.950168 14450 net.cpp:148] Top shape: 1 64 26 26 (43264) I0208 16:09:48.950172 14450 net.cpp:156] Memory required for data: 271178752 I0208 16:09:48.950177 14450 layer_factory.hpp:77] Creating layer layer29-concat I0208 16:09:48.950183 14450 net.cpp:91] Creating Layer layer29-concat I0208 16:09:48.950188 14450 net.cpp:469] layer29-concat <- layer28-reshape I0208 16:09:48.950192 14450 net.cpp:469] layer29-concat <- layer25-conv I0208 16:09:48.950199 14450 net.cpp:443] layer29-concat -> layer29-concat F0208 16:09:48.950209 14450 concat_layer.cpp:42] Check failed: top_shape[j] == bottom[i]->shape(j) (26 vs. 13) All inputs must have the same shape, except at concat_axis. Check failure stack trace: Aborted (core dumped)

ysh329 commented 6 years ago

@YogeshShitole Can you give execution log from darknet? Besides, I implemented auto shape infer for reorg layer here.

YogeshShitole commented 6 years ago

Hi @ysh329 here is my darknet2caffe_convert.log

block:OrderedDict([('type', 'route'), ('layers', '-9')]) block[type]: reorg block[stride]: 2 ============== reorg ========= reshape['top']: layer28-reshape layer_id: 28 bottom: layer27-conv block:OrderedDict([('type', 'route'), ('layers', '-1,-4')]) from_layer: ['-1', '-4'] prev_layer_id1: 28 prev_layer_id2: 25 layer_id: 29 concat_layer: OrderedDict([('name', 'layer29-concat'), ('type', 'Concat'), ('bottom', ['layer28-reshape', 'layer25-conv']), ('top', 'layer29-concat')])

I also tried auto shape infer implementation of yours which is also not working with yolo-voc.cfg and yolo-voc.weights

ysh329 commented 6 years ago

@YogeshShitole I mean Darknet's execution log, which is like below:

layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   288 x 288 x   3   ->   288 x 288 x  32
    1 max          2 x 2 / 2   288 x 288 x  32   ->   144 x 144 x  32
    2 conv     64  3 x 3 / 1   144 x 144 x  32   ->   144 x 144 x  64
    3 max          2 x 2 / 2   144 x 144 x  64   ->    72 x  72 x  64
    4 conv    128  3 x 3 / 1    72 x  72 x  64   ->    72 x  72 x 128
    5 conv     64  1 x 1 / 1    72 x  72 x 128   ->    72 x  72 x  64
    6 conv    128  3 x 3 / 1    72 x  72 x  64   ->    72 x  72 x 128
    7 max          2 x 2 / 2    72 x  72 x 128   ->    36 x  36 x 128
    8 conv    256  3 x 3 / 1    36 x  36 x 128   ->    36 x  36 x 256
    9 conv    128  1 x 1 / 1    36 x  36 x 256   ->    36 x  36 x 128
   10 conv    256  3 x 3 / 1    36 x  36 x 128   ->    36 x  36 x 256
   11 max          2 x 2 / 2    36 x  36 x 256   ->    18 x  18 x 256
   12 conv    512  3 x 3 / 1    18 x  18 x 256   ->    18 x  18 x 512
   13 conv    256  1 x 1 / 1    18 x  18 x 512   ->    18 x  18 x 256
   14 conv    512  3 x 3 / 1    18 x  18 x 256   ->    18 x  18 x 512
   15 conv    256  1 x 1 / 1    18 x  18 x 512   ->    18 x  18 x 256
   16 conv    512  3 x 3 / 1    18 x  18 x 256   ->    18 x  18 x 512
   17 max          2 x 2 / 2    18 x  18 x 512   ->     9 x   9 x 512
   18 conv   1024  3 x 3 / 1     9 x   9 x 512   ->     9 x   9 x1024
   19 conv    512  1 x 1 / 1     9 x   9 x1024   ->     9 x   9 x 512
   20 conv   1024  3 x 3 / 1     9 x   9 x 512   ->     9 x   9 x1024
   21 conv    512  1 x 1 / 1     9 x   9 x1024   ->     9 x   9 x 512
   22 conv   1024  3 x 3 / 1     9 x   9 x 512   ->     9 x   9 x1024
   23 conv   1024  3 x 3 / 1     9 x   9 x1024   ->     9 x   9 x1024
   24 conv   1024  3 x 3 / 1     9 x   9 x1024   ->     9 x   9 x1024
   25 route  16
   26 reorg              / 2    18 x  18 x 512   ->     9 x   9 x2048
   27 route  26 24
   28 conv   1024  3 x 3 / 1     9 x   9 x3072   ->     9 x   9 x1024
   29 conv     75  1 x 1 / 1     9 x   9 x1024   ->     9 x   9 x  75
   30 detection

Or you have same execution log as mine above?

ysh329 commented 6 years ago

tiny-yolov2

layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   640 x 480 x   3   ->   640 x 480 x  16
    1 max          2 x 2 / 2   640 x 480 x  16   ->   320 x 240 x  16
    2 conv     32  3 x 3 / 1   320 x 240 x  16   ->   320 x 240 x  32
    3 max          2 x 2 / 2   320 x 240 x  32   ->   160 x 120 x  32
    4 conv     64  3 x 3 / 1   160 x 120 x  32   ->   160 x 120 x  64
    5 max          2 x 2 / 2   160 x 120 x  64   ->    80 x  60 x  64
    6 conv    128  3 x 3 / 1    80 x  60 x  64   ->    80 x  60 x 128
    7 max          2 x 2 / 2    80 x  60 x 128   ->    40 x  30 x 128
    8 conv    256  3 x 3 / 1    40 x  30 x 128   ->    40 x  30 x 256
    9 max          2 x 2 / 2    40 x  30 x 256   ->    20 x  15 x 256
   10 conv    512  3 x 3 / 1    20 x  15 x 256   ->    20 x  15 x 512
   11 max          2 x 2 / 1    20 x  15 x 512   ->    20 x  15 x 512
   12 conv   1024  3 x 3 / 1    20 x  15 x 512   ->    20 x  15 x1024
   13 route  8
   14 conv     64  1 x 1 / 1    40 x  30 x 256   ->    40 x  30 x  64
   15 reorg              / 2    40 x  30 x  64   ->    20 x  15 x 256
   16 route  15 12
   17 conv   1024  3 x 3 / 1    20 x  15 x1280   ->    20 x  15 x1024
   18 conv     35  1 x 1 / 1    20 x  15 x1024   ->    20 x  15 x  35
   19 detection

tiny-yolo-125

layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   13 conv   1024  3 x 3 / 1    13 x  13 x1024   ->    13 x  13 x1024
   14 conv    125  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 125
   15 detection

YogeshShitole commented 6 years ago

@ysh329 with darknet it is executing perfectly below is darknet execution log

layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  32
    1 max          2 x 2 / 2   416 x 416 x  32   ->   208 x 208 x  32
    2 conv     64  3 x 3 / 1   208 x 208 x  32   ->   208 x 208 x  64
    3 max          2 x 2 / 2   208 x 208 x  64   ->   104 x 104 x  64
    4 conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128
    5 conv     64  1 x 1 / 1   104 x 104 x 128   ->   104 x 104 x  64
    6 conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128
    7 max          2 x 2 / 2   104 x 104 x 128   ->    52 x  52 x 128
    8 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
    9 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
   10 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
   11 max          2 x 2 / 2    52 x  52 x 256   ->    26 x  26 x 256
   12 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   13 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   14 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   15 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   16 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   17 max          2 x 2 / 2    26 x  26 x 512   ->    13 x  13 x 512
   18 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   19 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512
   20 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   21 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512
   22 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   23 conv   1024  3 x 3 / 1    13 x  13 x1024   ->    13 x  13 x1024
   24 conv   1024  3 x 3 / 1    13 x  13 x1024   ->    13 x  13 x1024
   25 route  16
   26 conv     64  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x  64
   27 reorg              / 2    26 x  26 x  64   ->    13 x  13 x 256
   28 route  27 24
   29 conv   1024  3 x 3 / 1    13 x  13 x1280   ->    13 x  13 x1024
   30 conv    125  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 125
   31 detection
``

ysh329 commented 6 years ago

@YogeshShitole I found this bug caused by auto shape infer of reorg layer in my code :rofl: . I'm fixing this bug now. :rofl:

YogeshShitole commented 6 years ago

@ysh329 Thank you 😊

ysh329 commented 6 years ago

@YogeshShitole hey, big guy. I fixed bug in this branch. Feel free to have a try. :rofl:

ysh329 commented 6 years ago

layer { 
    name: "data" 
    type: "Input" 
    top: "data" 
    input_param { 
        shape { 
            dim: 1 
            dim: 3 
            dim: 288 
            dim: 288 
         }  
    } 
}

The third dim is height of input.

nixnmtm commented 6 years ago

@YogeshShitole, you said it is working for you with tiny-yolo-voc.cfg and tiny-yolo-voc.weights using darknet2caffe.py.

But if i run it python3 darknet2caffe.py ../darknet/cfg/tiny-yolo-voc.cfg ../darknet/weights/tiny-yolo-voc.weights

Gives error [libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 14:14: Expected integer. WARNING: Logging before InitGoogleLogging() is written to STDERR F0322 03:10:21.651065 22037 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: ../darknet/cfg/tiny-yolo-voc.prototxt Check failure stack trace: Aborted

Please help me to solve it. As far as i know , only you guys have come up with a working model using reorg layers in caffe.

ysh329 commented 6 years ago

@nixnmtm try docker image of caffe-cpu

nixnmtm commented 6 years ago

Ok, Thank you. It is working now.

joeysu commented 5 years ago

Hi,

When I tried to convert yolov2 to caffe version, I found the results of concat and route are different.

concat looks like concatenate two flatten blobs.

route looks another way which I don't know exactly.

And the final results is totally wrong.

Please help me, thanks!

ysh329 commented 5 years ago

@s5plus1 Hi, route is a connection way, similar to concat, route's value means: Starting from the current layer, it is connected to the countdown n-th layer of the current layer.

joeysu commented 5 years ago

@ysh329 Thanks for your quick reply.

After debugging, I found the the difference is caused by reorg layer.

The results of reorg in darknet and reshape in caffe are different.

I tried to flatten the reorg layer output and the reshape layer output, and compared.

It seems that reorg is not equal to reshape? (correct me if I'm wrong pls) ...

reorg

reshape

ysh329 commented 5 years ago

@s5plus1 Yeah, they're different. The impl. of reorg you need to refer darknet's and it's really curious operation. Darknet's impl. of reorg is clearly but confusing and you can copy its codes to your layers in caffe.

joeysu commented 5 years ago

@ysh329 Got it! Thanks again!

ysh329 / deep-learning-model-convertor

Convert darknet yolov2 model to caffe #24

pytorch-caffe-darkenet-convert

title: Reshape Layer

Reshape Layer

Parameters

more in caffe.proto

TODO: auto shape infer

TODO: auto shape infer

reorg

reshape