Finetuning 错误 - Githubissues

junedgar commented 7 years ago

您好！感谢您将这些模型公开！我在使用您的百度云模型进行Finetuning的时候，不管是inceptionV4或者inceptionResnetV2总会遇到下面这个问题： Check failed: target_blobs.size() == source_layer.blobs_size() (2 vs. 1) Incompatible number of blobs for layer conv1_3x3_s2 不知道您这边能否给我点建议解决这个问题

shicai commented 7 years ago

注意conv层的bias_term，没设置的时候默认是true的，模型里是要求为false的。你不设置bias_term: false就会出现这个错误。

junedgar commented 7 years ago

@shicai 谢谢您的建议不过我在卷积层中加入了bias_term:false ，如下，但是出现了新的错误信息

convolution_param {
 num_output: 32
 pad: 0
 kernel_size: 3
 stride: 2
 bias_term: false
 weight_filler {
        type: "xavier"
        std: 0.01
}
 bias_filler {
 type: "constant"
 value: 0.2
 }

I0626 14:45:46.157280 65202 net.cpp:157] Top shape: 64 32 149 149 (45467648)
I0626 14:45:46.157291 65202 net.cpp:165] Memory required for data: 250530816
F0626 14:45:46.157311 65202 net.cpp:169] Check failed: param_size <= num_param_blobs (2 vs. 1) Too many params specified for layer conv1_3x3_s2

我google了一下又说要把bias_term去掉。链接地址

junedgar commented 7 years ago

@shicai 当我根据链接地址这个链接建议把param参数注释掉，如下，然后就又报原来的错误了

layer {
  name: "conv1_3x3_s2"
  type: "Convolution"
  bottom: "data"
  top: "conv1_3x3_s2"
  #param {
  #  lr_mult: 1
  #  decay_mult: 1
  #}
  #param {
  #  lr_mult: 2
  #  decay_mult: 0
  #}
  convolution_param {
    num_output: 32
    pad: 0
    kernel_size: 3
    stride: 2
    bias_term: false
    weight_filler {
      type: "xavier"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}

Check failed: target_blobs.size() == source_layer.blobs_size() (2 vs. 1) Incompatible number of blobs for layer conv2_3x3_s1

soeaver commented 7 years ago

@junedgar 你虽然设置了bias_term: false，但是却设置了bias的初始化：bias_filler ，把这部分也去掉。还有模型在百度云下载的时候就有deploy.prototxt，请一定要参照这个配置文件修改，不要参照github里的，否则极有可能报错或者无法复现精度

shicai commented 7 years ago

正确的写法是这个，你只有weight，没有bias，所以param只需要设置一组，没有bias，自然也就没有bias_filler。

layer {
  name: "conv1_3x3_s2"
  type: "Convolution"
  bottom: "data"
  top: "conv1_3x3_s2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  convolution_param {
    num_output: 32
    pad: 0
    kernel_size: 3
    stride: 2
    bias_term: false
    weight_filler {
      type: "xavier"
    }
  }
}

junedgar commented 7 years ago

@soeaver 好的谢谢指教

junedgar commented 7 years ago

@shicai 我按照你的改法还是出现相同的错误，谢谢你的回答，我这边再研究看看 Attempting to upgrade batch norm layers using deprecated params: inceptionV4/inception_v4.caffemodel I0626 15:52:19.389828 76449 upgrade_proto.cpp:80] Successfully upgraded batch norm layers using deprecated params. I0626 15:52:19.389855 76449 net.cpp:761] Ignoring source layer input F0626 15:52:19.389892 76449 net.cpp:767] Check failed: target_blobs.size() == source_layer.blobs_size() (2 vs. 1) Incompatible number of blobs for layer conv2_3x3_s1

shicai commented 7 years ago

刚才是conv1_3x3_s2，这次还是同一个错误，不过是conv2_3x3_s1，有区别的，你没注意到吗？

junedgar commented 7 years ago

@shicai 看到了，不好意思，粗心啦！给你添麻烦啦！谢谢你的帮助！

xizi commented 7 years ago

@junedgar 你能把修正后的train_val.prototxt和solver.prototxt发我一份嘛，我刚刚也遇到了这个问题，谢谢~😁

junedgar commented 7 years ago

@xizi 白天没空晚上会找个时间修改一下如果能跑起来我就给你邮箱发一份

xizi commented 7 years ago

@junedgar Thank you~, my e-mail：shengxili.xidian@gmail.com

xizi commented 7 years ago

@soeaver 你能把修正后的train_val.prototxt和solver.prototxt发我一份嘛，我刚刚也遇到了这个问题，谢谢~😁

junedgar commented 7 years ago

@soeaver 我对了一下train_val以及deploy的网络层，发现网络层好像有点不太一样，如图，导致后面预训练模型跟训练网络的数据总是对不上(inceptionV4)

Cannot copy param 0 weights from layer 'inception_b1_1x7_2'; shape mismatch. Source param shape is 224 192 1 7 (301056); target param shape is 192 192 1 7 (258048). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.

soeaver commented 7 years ago

@junedgar 还有模型在百度云下载的时候就有deploy.prototxt，请一定要参照这个配置文件修改，不要参照github里的，否则极有可能报错或者无法复现精度

matakk commented 7 years ago

@junedgar @xizi @shicai 大家好。我也尝试修改参数微调 inception v3，但是遇到了

F0815 13:22:46.454238 7220 net.cpp:767] Check failed: target_blobs.size() == source_layer.blobs_size() (0 vs. 2) Incomp atible number of blobs for layer conv4_3x3_reduce_scale Check failure stack trace:

一直改到这个地方，不知道参数了。对模型我还不熟悉。请问哪里有完整的训练和部署和prototxt吗？能把你们训练的也分享一份吗？谢谢！！

junedgar commented 7 years ago

@matakk 我用的是InceptionV2-ResNet，你可以参考这个code.如何把train_val.prototxt改成deploy.prototxt可以参考文章 inception-v3/4都是一样的修改原理

soeaver / caffe-model

Finetuning 错误 #19