thtrieu / darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
GNU General Public License v3.0
6.14k stars 2.08k forks source link

Using pretrained convolutional weights #14

Closed zayfod closed 7 years ago

zayfod commented 7 years ago

The Yolo training examples use convolutional weights pretrained on ImageNet http://pjreddie.com/darknet/yolov1/ . These seem to be created using the Darknet partial command which saves only the weights of the first N network layers (e.g. 24).

I wonder whether there is a way to use pretrained CNN weights with flow in a similar fashion?

Should I train the whole network from scratch? What is the down side?

I tried with the --load option like this:

./flow --train --model mymodel --load ../darknet/darknet.conv.weights --trainer adam --gpu 0.8

flow does not seem to behave as I expected though. It loads configs/mymodel.cfg but then it also tries to load configs/darknet.conf.cfg .

I tried to cheat and rename darknet.conf.weights to yolo-tiny.weights so that yolo-tiny.cfg is picked up but I end up with an error message like this (makes sense):

Error: Configuration suggests a bigger size than ./bin/yolo-tiny.weights actually is.

I wonder whether I can save the pretrained Darknet weights in a checkpoint somehow?...

It seems that #5 is related, but not the same use case.

thtrieu commented 7 years ago

Here is what I know about extraction.conv:

  1. Although in the .cfg file there is a fully connected layer, it is not saved in .weights. That means, if you follow the .cfg all the way to the end to read .weights, then there will not be enough bytes in .weights
  2. If you follow the .cfg file excluding all other layers besides conv layers, then there is more bytes than expected!

So what happen? A little more investigation gives me the result that the excess bytes in the second case is three times more than the sum of sizes of all biases, this is the absolute hint that batch normalize (three additional parameters: scale, mean, var) is there, although .cfg did not specify it.

So fixing this is simple: when the configuration has name with .conv., only follow the convolutional layers and add batch normalize for each of them, even if .cfg does not say so. I did it in the new commit.

P/S1: quite a mess of inconsistency from darknet

P/S2: The remaining problem is still how to get darktf to work with batch normalization.

thtrieu commented 7 years ago

BTW, I cannot find the darknet.conv.cfg, can you point me to it? Matching between .cfg and .weights is one hell of a problem. I need to look at all of them specifically.

zayfod commented 7 years ago

I cannot. Until you brought it up, I was not aware that there are special .conv.cfg configurations.

All Darknet training examples use the regular configuration (.cfg) with pretrained CNN weights (.conv.weights) - http://pjreddie.com/darknet/yolov1/

Here is my understanding on how the smaller Darknet .conv.weights files are generated.

  1. The Darknet partial command is used:

./darknet partial cfg/extraction.cfg extraction.weights extraction.conv.weights 24

  1. This results in a call to partial() in darknet.c:
void partial(char *cfgfile, char *weightfile, char *outfile, int max)
{
    gpu_index = -1;
    network net = parse_network_cfg(cfgfile);
    if(weightfile){
        load_weights_upto(&net, weightfile, max);
    }
    *net.seen = 0;
    save_weights_upto(net, outfile, max);
}
  1. Here is how save_weights_upto() in parser.c looks like (simplified!):
void save_weights_upto(network net, char *filename, int cutoff)
{
    fprintf(stderr, "Saving weights to %s\n", filename);
    FILE *fp = fopen(filename, "w");
...
    for(i = 0; i < net.n && i < cutoff; ++i){
        layer l = net.layers[i];
        if(l.type == CONVOLUTIONAL){
            save_convolutional_weights(l, fp);
        } if(l.type == CONNECTED){
            save_connected_weights(l, fp);
        } if(l.type == BATCHNORM){
            save_batchnorm_weights(l, fp);
        } if(l.type == RNN){
            save_connected_weights(*(l.input_layer), fp);
            save_connected_weights(*(l.self_layer), fp);
            save_connected_weights(*(l.output_layer), fp);
        } if(l.type == GRU){
            save_connected_weights(*(l.input_z_layer), fp);
            save_connected_weights(*(l.input_r_layer), fp);
            save_connected_weights(*(l.input_h_layer), fp);
            save_connected_weights(*(l.state_z_layer), fp);
            save_connected_weights(*(l.state_r_layer), fp);
            save_connected_weights(*(l.state_h_layer), fp);
        } if(l.type == CRNN){
            save_convolutional_weights(*(l.input_layer), fp);
            save_convolutional_weights(*(l.self_layer), fp);
            save_convolutional_weights(*(l.output_layer), fp);
        } if(l.type == LOCAL){
            int locations = l.out_w*l.out_h;
            int size = l.size*l.size*l.c*l.n*locations;
            fwrite(l.biases, sizeof(float), l.outputs, fp);
            fwrite(l.weights, sizeof(float), size, fp);
        }
    }
    fclose(fp);
}

So, definitely only certain types of layers are saved and also only the first N layers, as defined on the command-line are saved (24 in the example above).

It seems possible to implement a similar command in flow that will save partial Darknet .conv.weight files or partial checkpoint files.

Does this make sense to you? If it does, I'd be happy to give it a try...

thtrieu commented 7 years ago

I don't really see the appeal of doing so. Could you elaborate more? Since .weights file are for darknet users, what we are trying to achieve in darktf is a tensorflow implementation compatible to darknet. tensorflow has its own ecosystem of checkpoints, graph def, etc so I don't really understand how reverting back to a .weights file will help

Anyway, I am trying to address the issue you raised at the 1st post, that is the following command does not work:

./flow --train --model mymodel --load ../darknet/darknet.conv.weights --trainer adam --gpu 0.8

I figured out the reason is that .conv.weights are saved in a special way and solved it in a new commit. However, that only solves the first part of the question: cannot load .weights file, the second part flow the loaded weights properly is still a work in progress, namely because of the batch-normalization layer.

Nevertheless, a big thanks for your contribution. Truly helpful.

P/S: Now that I see your point. My answer would be, it is easier to just selectively load the full weights file rather than selectively save and load all of the saved. The selective load, is already achieved with my current code, however, only to a certain extent: all first identical layers are reused. I.e.:

./flow --model yolo-4c --load ./bin/yolo-tiny.weights.
./flow --model yolo-4c --load ./backup/yolo-tiny-2c

will cycle through layers of 4c and tiny (i.e. 1st layer of 4c and 1st layer of tiny, then 2nd of 4c and 2nd of tiny, then 3rd, ...) and as long as the pair is identical (size of kernel, size of padding, etc) then the weights of this layer will be reused from yolo-tiny to yolo-4c. The cycle terminates when a first mismatch is found.

With this, however, you still cannot specify a number as to where the loading should stop as you can do in darknet. Adding this would be easy using an additional option --cutoff, say, with --model yolo-4c --load ./bin/yolo-tiny.weights --cutoff 24 will only cycle through the first 24 pairs of layers of the two configs. If this suggestion sounds okay to you, I'll do it since I understand the class loader better as the author.