Training pvanet-faster rcnn on VOC pascal data

rajiv235 commented 7 years ago

I am trying to train Faster rcnn with pvanet on VOC pascal dataset using https://github.com/sanghoon/pva-faster-rcnn. However I get dimension mismatch error for conv4_1/incep (Concat layer) Below is a snippet of caffe log while trying to train. I0923 18:58:11.487720 26638 net.cpp:157] Top shape: 1 48 38 63 (114912) I0923 18:58:11.487735 26638 net.cpp:165] Memory required for data: 579115860 I0923 18:58:11.487751 26638 layer_factory.hpp:77] Creating layer conv4_1/incep/2_0/relu I0923 18:58:11.487768 26638 net.cpp:100] Creating Layer conv4_1/incep/2_0/relu I0923 18:58:11.487782 26638 net.cpp:434] conv4_1/incep/2_0/relu <- conv4_1/incep/2_0 I0923 18:58:11.487794 26638 net.cpp:395] conv4_1/incep/2_0/relu -> conv4_1/incep/2_0 (in-place) I0923 18:58:11.487808 26638 net.cpp:150] Setting up conv4_1/incep/2_0/relu I0923 18:58:11.487826 26638 net.cpp:157] Top shape: 1 48 38 63 (114912) I0923 18:58:11.487836 26638 net.cpp:165] Memory required for data: 579575508 I0923 18:58:11.487851 26638 layer_factory.hpp:77] Creating layer conv4_1/incep/2_1/conv I0923 18:58:11.487869 26638 net.cpp:100] Creating Layer conv4_1/incep/2_1/conv I0923 18:58:11.487886 26638 net.cpp:434] conv4_1/incep/2_1/conv <- conv4_1/incep/2_0 I0923 18:58:11.487902 26638 net.cpp:408] conv4_1/incep/2_1/conv -> conv4_1/incep/2_1 I0923 18:58:11.488270 26638 net.cpp:150] Setting up conv4_1/incep/2_1/conv I0923 18:58:11.488294 26638 net.cpp:157] Top shape: 1 48 38 63 (114912) I0923 18:58:11.488306 26638 net.cpp:165] Memory required for data: 580035156 I0923 18:58:11.488320 26638 layer_factory.hpp:77] Creating layer conv4_1/incep/2_1/relu I0923 18:58:11.488342 26638 net.cpp:100] Creating Layer conv4_1/incep/2_1/relu I0923 18:58:11.488358 26638 net.cpp:434] conv4_1/incep/2_1/relu <- conv4_1/incep/2_1 I0923 18:58:11.488371 26638 net.cpp:395] conv4_1/incep/2_1/relu -> conv4_1/incep/2_1 (in-place) I0923 18:58:11.488390 26638 net.cpp:150] Setting up conv4_1/incep/2_1/relu I0923 18:58:11.488404 26638 net.cpp:157] Top shape: 1 48 38 63 (114912) I0923 18:58:11.488415 26638 net.cpp:165] Memory required for data: 580494804 I0923 18:58:11.488425 26638 layer_factory.hpp:77] Creating layer conv4_1/incep/pool I0923 18:58:11.488440 26638 net.cpp:100] Creating Layer conv4_1/incep/pool I0923 18:58:11.488456 26638 net.cpp:434] conv4_1/incep/pool <- conv3_4_conv3_4_0_split_3 I0923 18:58:11.488471 26638 net.cpp:408] conv4_1/incep/pool -> conv4_1/incep/pool I0923 18:58:11.488524 26638 net.cpp:150] Setting up conv4_1/incep/pool I0923 18:58:11.488546 26638 net.cpp:157] Top shape: 1 128 37 62 (293632) I0923 18:58:11.488559 26638 net.cpp:165] Memory required for data: 581669332 I0923 18:58:11.488574 26638 layer_factory.hpp:77] Creating layer conv4_1/incep/poolproj/conv I0923 18:58:11.488592 26638 net.cpp:100] Creating Layer conv4_1/incep/poolproj/conv I0923 18:58:11.488610 26638 net.cpp:434] conv4_1/incep/poolproj/conv <- conv4_1/incep/pool I0923 18:58:11.488623 26638 net.cpp:408] conv4_1/incep/poolproj/conv -> conv4_1/incep/poolproj I0923 18:58:11.488951 26638 net.cpp:150] Setting up conv4_1/incep/poolproj/conv I0923 18:58:11.488976 26638 net.cpp:157] Top shape: 1 128 37 62 (293632) I0923 18:58:11.488986 26638 net.cpp:165] Memory required for data: 582843860 I0923 18:58:11.489001 26638 layer_factory.hpp:77] Creating layer conv4_1/incep/poolproj/relu I0923 18:58:11.489019 26638 net.cpp:100] Creating Layer conv4_1/incep/poolproj/relu I0923 18:58:11.489032 26638 net.cpp:434] conv4_1/incep/poolproj/relu <- conv4_1/incep/poolproj I0923 18:58:11.489045 26638 net.cpp:395] conv4_1/incep/poolproj/relu -> conv4_1/incep/poolproj (in-place) I0923 18:58:11.489065 26638 net.cpp:150] Setting up conv4_1/incep/poolproj/relu I0923 18:58:11.489079 26638 net.cpp:157] Top shape: 1 128 37 62 (293632) I0923 18:58:11.489089 26638 net.cpp:165] Memory required for data: 584018388 I0923 18:58:11.489100 26638 layer_factory.hpp:77] Creating layer conv4_1/incep I0923 18:58:11.489112 26638 net.cpp:100] Creating Layer conv4_1/incep I0923 18:58:11.489140 26638 net.cpp:434] conv4_1/incep <- conv4_1/incep/0 I0923 18:58:11.489151 26638 net.cpp:434] conv4_1/incep <- conv4_1/incep/1_0 I0923 18:58:11.489168 26638 net.cpp:434] conv4_1/incep <- conv4_1/incep/2_1 I0923 18:58:11.489179 26638 net.cpp:434] conv4_1/incep <- conv4_1/incep/poolproj I0923 18:58:11.489197 26638 net.cpp:408] conv4_1/incep -> conv4_1/incep F0923 18:58:11.489228 26638 concat_layer.cpp:42] Check failed: top_shape[j] == bottom[i]->shape(j) (38 vs. 37) All inputs must have the same shape, except at concat_axis

Looks like the dimension of pooling layer and conv layer differ by 1. Any help would be appreciated.

Thank you

sanghoon commented 7 years ago

Hi, For PVAnet, the size of input images should be a multiple of 32. You'll be able to train a model by running the training script with '--set TRAIN.SCALE_MULTIPLE_OF 32'. or adding the following lines into your configuration yml

TRAIN:
  SCALE_MULTIPLE_OF: 32

rajiv235 commented 7 years ago

Thanks @sanghoon I tried but still the same issue. Can you please share your cfg file and train.protoxt? Appreciate your help.

sanghoon commented 7 years ago

Oh, sorry about that. I'll update sample training prototxt and cfg. Please give me a couple of days since I haven't tried to train a network on this exact code.

rajiv235 commented 7 years ago

No worries. Thanks for the help. :)

sanghoon commented 7 years ago

Hi, I've found that you should set TRAIN.SCALES to be [608,] and TRAIN.MAX_SIZE to be 1024. You can try with these params.

I'll upload a sample config when it's ready.

rajiv235 commented 7 years ago

@sanghoon . Thanks that solved the issue. However I ran in a different issue I0928 22:12:01.968021 11742 net.cpp:761] Ignoring source layer data I0928 22:12:01.968085 11742 net.cpp:761] Ignoring source layer label_data_1_split I0928 22:12:01.971196 11742 net.cpp:761] Ignoring source layer pool5 F0928 22:12:02.137482 11742 net.cpp:774] Cannot copy param 0 weights from layer 'fc6'; shape mismatch. Source param shape is 4096 13824 (56623104); target param shape is 4096 18432 (75497472). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.

when initializing the weights with imagenet trained model(FULL). It would be very helpful if you could share training.pt as well. Thanks in advance!

jay2002 commented 7 years ago

Hi @rajiv235 Can you please share your train.prototxt and cfg file?

rajiv235 commented 7 years ago

Sure. Let me know if you find any error. Thanks train.zip

jay2002 commented 7 years ago

@rajiv235 Thanks for your kindly help! There seems a little problem on the bias term, so I set bias term to false. BTW, if you want to use the pretrained weights fc6, fc7 etc, maybe you can modify the channels of convf (which is the concat of convf_rpn and convf_2) to 384.

sanghoon commented 7 years ago

HI @rajiv235, @jay2002,

After checking models we've uploaded and reading your discussions, I've found the problem.

I'll upload modified prototxt and sample command in a couple of hours.

sanghoon commented 7 years ago

I've updated sample prototxts. Please refer to 4799b2889fecdccd823559a7f0300bc68d69aefd. However, I didn't have enough time to try training a network and investigating the outcomes. So these prototxts may have some bugs in them.

rajiv235 commented 7 years ago

Thanks @sanghoon . I tried running the example_finetune solver but unfortunately run into the same error with 'fc6' shape mismatch. Could I be running incorrect pretrained model? I am using imagenet trained model(FULL) as pretrained model. Thanks

sanghoon commented 7 years ago

Hi @rajiv235, The thing is that the last convolution layer in our ImageNet pre-trained model generates a 6_6_384 feature map, and the following fc6 layers gets a 6_6_384 input.

However, our final Faster R-CNN model has a fc6 layer with a 6_6_512 input. To run example_finetune, you should load weights from 'full/test.model' which has fc6 layer with 6_6_512 input. (or you can change the name of 'fc6' layer so that the params are re-initialized)

If you want to train a new model from the imagenet pre-trained model, I recommend you try running prototxts in 'example_train'.

rajiv235 commented 7 years ago

Thanks @sanghoon. I tried fine tuning. However ran into out of memory error. I tried changing the batch size but still no luck. I am currently using 4gp K20 GPU on AWS. I wanted to ask if that much memory should be enough?

Thanks

sanghoon commented 7 years ago

Hi @rajiv235,

With our training codes (slightly old version of Caffe), 5~6 GB is required for Faster R-CNN fine-tuning. If you want to train/fine-tune a model with less memory, I suggest you lower the configuration variable 'TRAIN.MAX_SIZE' (Please do not forget to set it a multiple of 32)

rajiv235 commented 7 years ago

Thanks @sanghoon . I have moved to a K80 GPU instance and I was able to start the training. Will see how it goes. I am closing this issue. Thanks for the help

sanghoon / pva-faster-rcnn

Training pvanet-faster rcnn on VOC pascal data #1