rykov8 / ssd_keras

Port of Single Shot MultiBox Detector to Keras
MIT License
1.1k stars 553 forks source link

[Question] Which parameters to change when increasing image size? #36

Open fferroni opened 7 years ago

fferroni commented 7 years ago

Hi,

I am trying to replicate the same model but for the 500x500 version in the paper. Apart from the input image_shape, and the priors, what other parameters need to be changed?

I am getting an error like this when running fit_generator

InvalidArgumentError: Incompatible shapes: [16,7308,4] vs. [16,28461,4]
     [[Node: sub_1 = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](strided_slice_20, strided_slice_21)]]
     [[Node: mul_11/_375 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_5128_mul_11", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

I tried seeing if there is anything hard-coded in the SSD300 object or generators, but perhaps I do not see it. Your advice would be appreciated! :)

For the new priors, I am using https://gist.github.com/codingPingjun/aa54be7993ca6b2d484cccf5a2c5c3d4 but with 500x500 size.

rykov8 commented 7 years ago

@fferroni could you, please, provide more information about the error? E.g. what is your model.summary() and what do you get after BBoxUtility.assign_boxes(...)? I mean, what tensor has shape [16,7308,4] and what - [16,28461,4]?

fferroni commented 7 years ago

Right. The shape of the y where y = bboxutility.assign_boxes(y) is (7308,14). whereas (None, 20073, 14) is the size of the final layer if I use input_shape = (500,500,3). I am a bit confused though because as far as I can tell, the bboxutility is using the tuple input_shape, which I have modified accordingly...

lantuzi commented 7 years ago

I have the same problem with changing image size, no matter which prior_box to use. Could please help and give hint @rykov8 . I think it is related to number of prior_box thanks in advance.

oarriaga commented 7 years ago

@lantuzi @fferroni One problem with changing the image size is that you will not be able to load the pre-trained weights. Also you would need to change the ground truth prior boxes, the assign_boxes method in the bounding box utility function, and the PriorBoxes layers. It might be easier to resize your images using any image_resize function to 300 x 300.

NegatioN commented 7 years ago

I believe the biggest problem for having a flexible PriorBox implementation is mostly related to this line in the python implementation which should really be like this including the lines below it.

Does anyone know how to implement this (so it works with the rest of the class) with the matrix math that's being used in PriorBox? Or have anyone seen any similar implementations? @oarriaga @rykov8 Do you have any pointers?

rykov8 commented 7 years ago

@NegatioN I don't see any problem here. My code assumes, that there can be only one possible min_size for prior, that is true for author's architectures. So, in their Caffe code min_sizes_.size() is always 0. PriorBox layer is not implemented from scratch, it is based on the original one, but I replaced cycles, as they are slow in python.

@fferroni @lantuzi I believe, your problem is that you use old priors to initialize BBoxUtility class. After you change the net architecture, you need to take priors, generated by the new net and pass it to the BBoxUtility during initialization.

NegatioN commented 7 years ago

@rykov8 yes, it works perfectly in the implementation you have now. Thanks for making it :) Based on @oarriaga 's comment and what I've read so far, it seems like it's not very easily modifiable?

min_sizes_.size() is always 0? So you're saying it becomes 1 because of one iteration of this line? Wouldn't it often be 1(min_size.size()) + 1(max_size.size()) = numpriors? Only the upper-most SSD prediction layer in your implementation has one of those set, but not the other. Maybe that's captured by your *2 here?

I'm trying to reimplement the most recent version of this paper with some basis in parts of the code you have written, as I don't understand the PriorBox layers 100%. However, as far as I understand it now the number of priorboxes I generate in the network should depend on three variables: conv_layer shape, num_max_size, num_min_size.

How would you go about scaling the number of PriorBoxes to fit the 8732 params required for the most recent version of the paper without allowing the implementation to depend on all of those three variables? Since it in your implementation is locked to conv_layer_x conv_layer_y 1 it doesn't seem to be flexible enough? And then after this the cascading effect of having to change the matrix math implementation would follow.

Maybe I've missed something glaringly obvious, in which case I would love if you (or anyone else) pointed it out :)

oarriaga commented 7 years ago

Hello @NegatioN, I am currently also working on implementing the "recent" version of SSD, as well as trying to incorporate smaller models. As far as I can tell from the original SSD code the authors do not seem to follow explicitly equation 4 from the latest version of the paper in their implementation. However, it is mentioned in their lastest version, that every PriorBox layer contains a scale sk (calculated using eq. 4) which is used along with the aspect_ratios, and the equations found below eq. 4 for the centers, widths and heights of the prior boxes in order to construct all the prior boxes for every layer. Therefore if I understand correctly your question

How can I scale the number of PriorBoxes to fit the 8732 params required for the most recent version of the paper without allowing the implementation to depend on all of those three variables?

We should probably follow eq. 4 for the scales, along with the aspect ratios and the equations for the box widths and heights and consequently it does depend on all the variables you mention.

fferroni commented 7 years ago

@rykov8 Thanks, I use this script https://gist.github.com/codingPingjun/aa54be7993ca6b2d484cccf5a2c5c3d4 to generate new prior boxes with a new image_width, img_height. Probably the box_configs need changing too?

@oarriaga Yes, but the objects I am trying to detect are quite small with 300x300 :( I think you can use most of the pre-trained weights, particularly the convolutional filters. Looking at the PriorBoxLayers, I don't see how this needs to be modified. I agree that new ground truth boxes need to be used - perhaps the link I attached can be a starting point?

oarriaga commented 7 years ago

@fferroni if you use the first layers of a pre-trained network and manage to connect it to the model here with PriorBoxes layers then yes, you shouldn't change the PriorBox layer. The PriorBox layer would only need to be changed when we change the prior_boxes.

ahrnbom commented 7 years ago

I am also trying to run SSD500, and I'm facing similar problems. I don't think previous responses here are very clear about exactly what needs to change to get it to work. As fferroni states, just changing the input resolution to 500x500 changes the output dimension to 20073. First of all, is this correct? As in, is it possible to make changes elsewhere (in the prior boxes fed into the BBoxUtility constructor, for example) to make the ground truth also be of dimension 20073? Or is the output dimension wrong, and something (the PriorBox layer?) needs to be changed to get back to the original dimension of 7308?

Edit: After studying the original Caffe implementations a little bit, comparing SSD512 and SSD300 (they might have abandoned SSD500), it seems a significant network change is obviously needed. I mean SSD512 has seven layers from where they do detection, not six like SSD300. So yeah, it's quite a lot of work to get this up and running.

Assuming I do change the network to match that of the caffe implementation of SSD512, what about the PriorBox layer would have to be changed? Isn't it sufficient to just add the extra PriorBox layer in the network and update the min_size and max_size of the PriorBox layers, and then making a new prior_boxes object accordingly?

fferroni commented 7 years ago

Well, instinctively I would say that one needs to change the prior box fed into the bbox utility. Increasing the size of the input image for a fixed convolutipnal stack yields output larger tensor. This means more prior boxes. I guess the benefit of having a larger size is indeed more prior boxes to correctly pinpoint an objects location. The script i attached to create a new prior boxes pickle (wich I found elsewhere) does not yield a larger tensor when increasing the input size so I think this needs to be changed accordingly. I was planning on doing this this weekend but maybe you can also try

On 10 Mar 2017 14:41, "ahrnbom" notifications@github.com wrote:

I am also trying to run SSD500, and I'm facing similar problems. I don't think previous responses here are very clear about exactly what needs to change to get it to work. As fferroni states, just changing the input resolution to 500x500 changes the output dimension to 20073. First of all, is this correct? As in, is it possible to make changes elsewhere (in the prior boxes fed into the BBoxUtility constructor, for example) to make the ground truth also be of dimension 20073? Or is the output dimension wrong, and something (the PriorBox layer?) needs to be changed to get back to the original dimension of 7308? One of oarriaga's previous answers seem to imply the latter, but I'm not sure I interpret it correctly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rykov8/ssd_keras/issues/36#issuecomment-285671391, or mute the thread https://github.com/notifications/unsubscribe-auth/APkjEum0fe-ELk9s7ZYECq1MC2sPYEXEks5rkVMWgaJpZM4L-Ij3 .

ahrnbom commented 7 years ago

I believe I have made the network model for SSD512. I can at least build this model, so it seems to work, seemingly without modifying the PriorBox layer class. So that's promising. Feel free to use this. It contains a main program that just builds the model and creates a visualization, showing that at the very least the input/output sizes match and the model is possible to build. ssd512.zip

As far as I can see, all that would remain is to create a new pickle object.

EDIT: I created a new pickle object! I used the following script (modified from the link above in this thread) and it seems to work, as far as I can tell. create_prior_box.zip

tachim commented 7 years ago

no need for a new pickle file see https://github.com/rykov8/ssd_keras/issues/52

fferroni commented 7 years ago

@ahrnbom Awesome thanks! Did you based the box_configs on the Cafe implementation?

m274d commented 7 years ago

@ahrnbom Thank you a lot! I have a question about your model

at line 155 of ssd512, you define num_priors = 4, but the original ssd300 has num_priors = 3. I am wondering whether it is correct or not. Here is the program slice:

    net['conv9_2_mbox_priorbox'] = priorbox(net['conv9_2'])
    # Prediction from pool6
    num_priors = 4
    x = Dense(num_priors * 4, name='pool6_mbox_loc_flat')(net['pool6'])

I am not sure whether I got it correctly. If you can offer me a reference about the net structure, it will help a lot!

Best