xmyqsh / FPN

Feature Pyramid Network
155 stars 58 forks source link

VGG based FPN model #8

Open helxsz opened 7 years ago

helxsz commented 7 years ago

I want to use FPN for the VGG model since I only have 2 gtx 980 with 8 G memory, do you have plan to share a VGG based model ?

leighton613 commented 7 years ago

Hi, I'm using VGG as the base model, I think you just need to change backbone from Resnet to VGG (expect fc), the rest can remain the same.

helxsz commented 7 years ago

I experimented the model as below


        with tf.variable_scope('vgg16'):

            (self.feed('data').conv(3, 3, 64, 1, 1,  name='conv1_1')
                              .conv(3, 3, 64, 1, 1,  name='conv1_2')
                              .max_pool(2, 2, 2, 2, padding='VALID',name='pool1'))

            (self.feed('pool1').conv(3, 3, 128, 1, 1,  name='conv2_1')
                                .conv(3, 3, 128, 1, 1,  name='conv2_2')
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool2'))

            (self.feed('pool2').conv(3, 3, 256, 1, 1,  name='conv3_1')
                                .conv(3, 3, 256, 1, 1,  name='conv3_2')
                                .conv(3, 3, 256, 1, 1,  name='conv3_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool3'))                

            (self.feed('pool3').conv(3, 3, 512, 1, 1,  name='conv4_1')
                                .conv(3, 3, 512, 1, 1,  name='conv4_2')
                                .conv(3, 3, 512, 1, 1,  name='conv4_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool4'))

            (self.feed('pool4').conv(3, 3, 512, 1, 1,  name='conv5_1')
                                .conv(3, 3, 512, 1, 1,  name='conv5_2')
                                .conv(3, 3, 512, 1, 1,  name='conv5_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool5'))

        with tf.variable_scope('Top-Down'):

            # Top-Down
            (self.feed('conv5_3') # C5
                 .conv(1, 1, 256, 1, 1, biased=True, relu=False, name='P5'))

            (self.feed('P5')
                 .max_pool(2, 2, 2, 2, padding='VALID',name='P6'))

            (self.feed('conv4_3') # C4
                 .conv(1, 1, 256, 1, 1, biased=True, relu=False, name='C4_lateral'))

            (self.feed('P5',
                       'C4_lateral')
                 .upbilinear(name='C5_topdown'))

            (self.feed('C5_topdown',
                       'C4_lateral')
                 .add(name='P4_pre')
                 .conv(3, 3, 256, 1, 1, biased=True, relu=False, name='P4'))

            (self.feed('conv3_3') #C3
                 .conv(1, 1, 256, 1, 1, biased=True, relu=False, name='C3_lateral'))

            (self.feed('P4',
                       'C3_lateral')
                 .upbilinear(name='C4_topdown'))

            (self.feed('C4_topdown',
                       'C3_lateral')
                 .add(name='P3_pre')
                 .conv(3, 3, 256, 1, 1, biased=True, relu= False, name='P3'))

            (self.feed('conv2_2') #C2
                 .conv(1, 1, 256, 1, 1, biased=True, relu=False, name='C2_lateral'))

            (self.feed('P3',
                       'C2_lateral')
                 .upbilinear(name='C3_topdown'))

            (self.feed('C3_topdown',
                       'C2_lateral')
                 .add(name='P2_pre')
                 .conv(3, 3, 256, 1, 1, biased=True, relu= False, name='P2'))

In the training, it outputs the result like this

Loading pretrained model weights from data/pretrain_model/VGG_imagenet.npy
ignore conv5_1 weights
ignore conv5_1 biases
ignore fc6 weights
ignore fc6 biases
ignore conv5_3 weights
ignore conv5_3 biases
ignore fc7 weights
ignore fc7 biases
ignore fc8 weights
ignore fc8 biases
ignore conv5_2 weights
ignore conv5_2 biases
ignore conv4_1 weights
ignore conv4_1 biases
ignore conv4_2 weights
ignore conv4_2 biases
ignore conv4_3 weights
ignore conv4_3 biases
ignore conv3_3 weights
ignore conv3_3 biases
ignore conv3_2 weights
ignore conv3_2 biases
ignore conv3_1 weights
ignore conv3_1 biases
ignore conv1_1 weights
ignore conv1_1 biases
ignore conv1_2 weights
ignore conv1_2 biases
ignore conv2_2 weights
ignore conv2_2 biases
ignore conv2_1 weights
ignore conv2_1 biases
ignore conv5_1 weights
ignore conv5_1 biases
ignore fc6 weights
ignore fc6 biases
ignore conv5_3 weights
ignore conv5_3 biases
ignore fc7 weights
ignore fc7 biases
ignore fc8 weights
ignore fc8 biases
ignore conv5_2 weights
ignore conv5_2 biases
ignore conv4_1 weights
ignore conv4_1 biases
ignore conv4_2 weights
ignore conv4_2 biases
ignore conv4_3 weights
ignore conv4_3 biases
ignore conv3_3 weights
ignore conv3_3 biases
ignore conv3_2 weights
ignore conv3_2 biases
ignore conv3_1 weights
ignore conv3_1 biases
ignore conv1_1 weights
ignore conv1_1 biases
ignore conv1_2 weights
ignore conv1_2 biases
ignore conv2_2 weights
ignore conv2_2 biases
ignore conv2_1 weights
ignore conv2_1 biases
ignore conv5_1 weights
ignore conv5_1 biases
ignore fc6 weights
ignore fc6 biases
ignore conv5_3 weights
ignore conv5_3 biases
ignore fc7 weights
ignore fc7 biases
ignore fc8 weights
ignore fc8 biases
ignore conv5_2 weights
ignore conv5_2 biases
ignore conv4_1 weights
ignore conv4_1 biases
ignore conv4_2 weights
ignore conv4_2 biases
ignore conv4_3 weights
ignore conv4_3 biases
ignore conv3_3 weights
ignore conv3_3 biases
ignore conv3_2 weights
ignore conv3_2 biases
ignore conv3_1 weights
ignore conv3_1 biases
ignore conv1_1 weights
ignore conv1_1 biases
ignore conv1_2 weights
ignore conv1_2 biases
ignore conv2_2 weights
ignore conv2_2 biases
ignore conv2_1 weights
ignore conv2_1 biases
ignore conv5_1 weights
ignore conv5_1 biases
ignore fc6 weights
ignore fc6 biases
ignore conv5_3 weights
ignore conv5_3 biases
ignore fc7 weights
ignore fc7 biases
ignore fc8 weights
ignore fc8 biases
ignore conv5_2 weights
ignore conv5_2 biases
ignore conv4_1 weights
ignore conv4_1 biases
ignore conv4_2 weights
ignore conv4_2 biases
ignore conv4_3 weights
ignore conv4_3 biases
ignore conv3_3 weights
ignore conv3_3 biases
ignore conv3_2 weights
ignore conv3_2 biases
ignore conv3_1 weights
ignore conv3_1 biases
ignore conv1_1 weights
ignore conv1_1 biases
ignore conv1_2 weights
ignore conv1_2 biases
ignore conv2_2 weights
ignore conv2_2 biases
ignore conv2_1 weights
ignore conv2_1 biases
ignore conv5_1 weights
ignore conv5_1 biases
ignore fc6 weights
ignore fc6 biases
ignore conv5_3 weights
ignore conv5_3 biases
ignore fc7 weights
ignore fc7 biases
ignore fc8 weights
ignore fc8 biases
ignore conv5_2 weights
ignore conv5_2 biases
ignore conv4_1 weights
ignore conv4_1 biases
ignore conv4_2 weights
ignore conv4_2 biases
ignore conv4_3 weights
ignore conv4_3 biases
ignore conv3_3 weights
ignore conv3_3 biases
ignore conv3_2 weights
ignore conv3_2 biases
ignore conv3_1 weights
ignore conv3_1 biases
ignore conv1_1 weights
ignore conv1_1 biases
ignore conv1_2 weights
ignore conv1_2 biases
ignore conv2_2 weights
ignore conv2_2 biases
ignore conv2_1 weights
ignore conv2_1 biases
2017-08-18 18:33:21.690196: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.61GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-08-18 18:33:23.668972: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 1.51G (1627206400 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-08-18 18:33:25.641768: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.61GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-08-18 18:33:27.112044: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.61GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
cudaCheckError() failed in ROIPoolForward: invalid device function
2017-08-18 18:33:29.665586: E tensorflow/stream_executor/stream.cc:289] Error recording event in stream: error recording CUDA event on stream 0x6581d40: CUDA_ERROR_DEINITIALIZED; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2017-08-18 18:33:29.665625: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_DEINITIALIZED
2017-08-18 18:33:29.665637: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1
cudaCheckError() failed in ROIPoolForward: invalid device function
Command terminated by signal 6
19.19user 2.58system 0:21.46elapsed 101%CPU (0avgtext+0avgdata 3051440maxresident)k
0inputs+840outputs (0major+378698minor)pagefaults 0swaps

I have two 970 GPU to work with, why it is still in memory issues, i wonder was it problematic with the code ?

xmyqsh commented 7 years ago

@helxsz First of all, you haven't use the same variable scope as my Resnet50 based FPN. And, this code does not support multi gpu training. So, only 4G or 3.5G memory you can use for VGG16 based RPN. You could decrease the RPN_BATCH_SIZE and BATCH_SIZE, as well as the SCALE of the image, just for certification.

Also, for issues like cudaCheckError() failed in ROIPoolForward: invalid device function, I have mentioned in #7 .

For memory hungry users, mxnet seems a good choice which also supports multi gpu.

helxsz commented 7 years ago

@xmyqsh , thanks, I will replace a 1080ti, but for the moment VGG16 is the option I could experiment, I will reduce the RPN_BATCH_SIZE and BATCH_SIZE.

when you mentioned about not using the same variable Resnet50 scope because currently the code is changing the basebone model from Resnet50 to VGG16. is there any problem?

Secondly, when working on this, should I code in this way

        with tf.variable_scope('vgg16'):
            (self.feed('data').conv(3, 3, 64, 1, 1,  name='conv1_1')
                              .conv(3, 3, 64, 1, 1,  name='conv1_2')
                              .max_pool(2, 2, 2, 2, padding='VALID',name='pool1'))
            (self.feed('pool1').conv(3, 3, 128, 1, 1,  name='conv2_1')
                                .conv(3, 3, 128, 1, 1,  name='conv2_2')
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool2'))
            (self.feed('pool2').conv(3, 3, 256, 1, 1,  name='conv3_1')
                                .conv(3, 3, 256, 1, 1,  name='conv3_2')
                                .conv(3, 3, 256, 1, 1,  name='conv3_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool3'))                
            (self.feed('pool3').conv(3, 3, 512, 1, 1,  name='conv4_1')
                                .conv(3, 3, 512, 1, 1,  name='conv4_2')
                                .conv(3, 3, 512, 1, 1,  name='conv4_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool4'))
            (self.feed('pool4').conv(3, 3, 512, 1, 1,  name='conv5_1')
                                .conv(3, 3, 512, 1, 1,  name='conv5_2')
                                .conv(3, 3, 512, 1, 1,  name='conv5_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool5'))

or should I code in this way

        with tf.variable_scope('vgg16'):

            (self.feed('data').conv(3, 3, 64, 1, 1,  name='conv1_1')
                              .conv(3, 3, 64, 1, 1,  name='conv1_2')
                              .max_pool(2, 2, 2, 2, padding='VALID',name='pool1')

                                .conv(3, 3, 128, 1, 1,  name='conv2_1')
                                .conv(3, 3, 128, 1, 1,  name='conv2_2')
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool2')

                                .conv(3, 3, 256, 1, 1,  name='conv3_1')
                                .conv(3, 3, 256, 1, 1,  name='conv3_2')
                                .conv(3, 3, 256, 1, 1,  name='conv3_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool3')

                                 conv(3, 3, 512, 1, 1,  name='conv4_1')
                                .conv(3, 3, 512, 1, 1,  name='conv4_2')
                                .conv(3, 3, 512, 1, 1,  name='conv4_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool4')

                               .conv(3, 3, 512, 1, 1,  name='conv5_1')
                                .conv(3, 3, 512, 1, 1,  name='conv5_2')
                                .conv(3, 3, 512, 1, 1,  name='conv5_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool5'))

At last, anything I am missing so when loading the VGG16 model, the weights and biases at all layers are ignored

ignore conv3_3 biases
ignore conv3_2 weights
xmyqsh commented 7 years ago

@helxsz First, the base model scripts should be same as your pretrained model. In addition to this, you should add additional variable scope to your base model scripts.

Say, your original base model scripts is

            (self.feed('data').conv(3, 3, 64, 1, 1,  name='conv1_1')
                              .conv(3, 3, 64, 1, 1,  name='conv1_2')
                              .max_pool(2, 2, 2, 2, padding='VALID',name='pool1'))
            (self.feed('pool1').conv(3, 3, 128, 1, 1,  name='conv2_1')
                                .conv(3, 3, 128, 1, 1,  name='conv2_2')
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool2'))
            (self.feed('pool2').conv(3, 3, 256, 1, 1,  name='conv3_1')
                                .conv(3, 3, 256, 1, 1,  name='conv3_2')
                                .conv(3, 3, 256, 1, 1,  name='conv3_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool3'))                
            (self.feed('pool3').conv(3, 3, 512, 1, 1,  name='conv4_1')
                                .conv(3, 3, 512, 1, 1,  name='conv4_2')
                                .conv(3, 3, 512, 1, 1,  name='conv4_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool4'))
            (self.feed('pool4').conv(3, 3, 512, 1, 1,  name='conv5_1')
                                .conv(3, 3, 512, 1, 1,  name='conv5_2')
                                .conv(3, 3, 512, 1, 1,  name='conv5_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool5'))

Then, your base end in FPN should be

with tf.variable_scope('res1_2'):
            (self.feed('data').conv(3, 3, 64, 1, 1,  name='conv1_1')
                              .conv(3, 3, 64, 1, 1,  name='conv1_2')
                              .max_pool(2, 2, 2, 2, padding='VALID',name='pool1'))
            (self.feed('pool1').conv(3, 3, 128, 1, 1,  name='conv2_1')
                                .conv(3, 3, 128, 1, 1,  name='conv2_2')
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool2'))

with tf.variable_scope('res3_5'):
            (self.feed('pool2').conv(3, 3, 256, 1, 1,  name='conv3_1')
                                .conv(3, 3, 256, 1, 1,  name='conv3_2')
                                .conv(3, 3, 256, 1, 1,  name='conv3_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool3'))                
            (self.feed('pool3').conv(3, 3, 512, 1, 1,  name='conv4_1')
                                .conv(3, 3, 512, 1, 1,  name='conv4_2')
                                .conv(3, 3, 512, 1, 1,  name='conv4_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool4'))
            (self.feed('pool4').conv(3, 3, 512, 1, 1,  name='conv5_1')
                                .conv(3, 3, 512, 1, 1,  name='conv5_2')
                                .conv(3, 3, 512, 1, 1,  name='conv5_3')    
                                .max_pool(2, 2, 2, 2, padding='VALID',name='pool5'))

Also, you could train your pretrained model in ImageNet with the second one directly.

helxsz commented 7 years ago

@xmyqsh, thanks for the answer. I am currently moving from Caffe to tensorflow, there are a few points not so well understood.

additional question, why you use variable scope ('res1_2') for the first two layers while ('res3_5'') for the rest layers, in vgg16 would it be simpler to use variable scope('vgg16') for all the model scripts?

xmyqsh commented 7 years ago

@helxsz It is my way to freeze first two layers or all the five layers from trainable. Someone may prefer to set the first two layers' parameter untrainable directly in an end-to-end training.

My way seems stupid, but it's convenience to change layers trainability with scope collection which alt-opt training is needed in a single net.

Or, you should edit four different net in an alt-opt training, and stop training the former phase, and reload the model from former phase to the latter phase which is the py-faster-rcnn does.

While in my implementation, there is no reload model and net from phase to phase, my alt-opt training is just like end-to-end learning style.

You could refer to my alt-opt training scripts for more details.

helxsz commented 7 years ago

oh, Thanks, I got your meaning..

xmyqsh commented 7 years ago

@helxsz @leighton613 How about your VGG16 based FPN’s result? I don't think VGG16 is appropriate for FPN's base end.

leighton613 commented 7 years ago

@xmyqsh For end-to-end training, the two rcnn losses are not converging. Then I tried to freeze rpn and only update rcnn & conv4 conv5, but still failed by now ;( I presume somehow adding the element-wise add layer causes the non-convergence issue (element-wise adding does not make sense in terms of intuition). I'll try alternative training later...or resent can fix this problem? I'm using VGG cauze I was playing with VGG based faster rcnn. Any particular reason you don't think it's a good choice for FPN?

xmyqsh commented 7 years ago

@leighton613 I don't think VGG based FPN can train a good P2~P6, you can have a look at the difference of this two network. The stride op in VGG is at the end of the block, which could not get good P5 and P6, I think.

I have got good end-to-end training result with resnet50 based FPN, and further improvement could be achieved by improving the pre-trained model as well as train the fast-rcnn part separately with more img_pre_batch.

I will answer your question above now.

  1. You should confirm your rpn is converged first before freezing it. I'm not sure if VGG could get a good converge.
  2. element-wise add layer is widely used by Kaiming He in his networks, eg. the series of resnets, FPN. I think element-wise add layer is appropriate in FPN's P layers which merge the fine & coarse feature map together, considering resolution and receptive field together. concat layer may be another choice, and the performance could be equally or better than the element-wise add layer, which need more memory. You could have a try.
leighton613 commented 7 years ago

@xmyqsh I didn't implement P6 and pool5 for P5 for the reasons you mentioned. I totally agree that the large stride in feature map is one of my major concerns.

For VGG, there is a paper RON which was also accepted by CVPR this year, doing a similar research as FPN but using VGG as their base. However, FPN is doing a (much) better job at least on coco benchmark, but VGG works I think. Of course, not to say Resnet101, VGG16(w/o fc)'s capability can hardly compete Resnet50's. I wonder (but really don't know) if the residual design per se makes it a better option in detection. I'll switch to Resnet base model later.

I tried concat two feature map together and deconv w/o lateral connection, but didn't observe a significant improvement compared with faster rcnn. (Interestingly, element-wise add layer makes convergence worse even for only P5 and P4) But all these experiments are conducted end-to-end, I'll keep your advice in mind and try alternative training these days. Hope it works.

xmyqsh commented 7 years ago

@leighton613 Good paper! And as you said, VGG16(w/o fc)'s capability is indeed smaller than Resnet50's. For VGG16, (3 + 64) * 3 * 3 * 64 + (64 + 128) * 3 * 3 * 128 + (128 + 256*3) * 3*3*256 + (256+512*3)*3*3*512 + 512*4*3*3*512 = 20018880 For Resnet50, 3*7*7*64 + (64*256) + (64*64 + 64*3*3*64 + 64*256) + (256*64 + 64*3*3*64 + 64*256) * 2 + (128*512) + (256*128 + 128*3*3*128 + 128 * 512) + (512*128 + 128*3*3*128 + 128 * 512)*3 + (256*1024) + (512*256 + 256*3*3*256 + 256*1024) + (1024*256 + 256*3*3*256 + 256*1024)*5 + (512*2048) + (1024*512 + 512*3*3*512 + 512*2048) + (2048*512 + 512*3*3*512 + 512*2048) * 3 = 26535104