tianzhi0549 / FCOS

FCOS: Fully Convolutional One-Stage Object Detection (ICCV'19)
https://arxiv.org/abs/1904.01355
Other
3.27k stars 630 forks source link

About the target computing #264

Open ricky40403 opened 4 years ago

ricky40403 commented 4 years ago

https://github.com/tianzhi0549/FCOS/blob/35331f706d21cc22e53376436bd434893d1a8ed2/fcos_core/modeling/rpn/fcos/loss.py#L224

Sorry for asking a stupid question, the way that compute the targets should be

  1. gather object_sizes_of_interest to all levels of locations (expanded_object_sizes_of_interest)
  2. compute through compute_targets_for_locations() to get the [batch, [ levels of targets]]
  3. split the returned batch of targets to level_first [level, targets in batches perlevels]
  4. flatten

My question is that why not calculate targets in each level and flatten ex:

# for class logits
level_first = []
for level in levels:
    # predictions
    box_cls_flatten.append(box_cls[l].permute(0, 2, 3, 1).reshape(-1, num_classes))

    # targets
    # by  locations  per level  and targets
    # return similar to compute_targets_for_locations() but only one level per for loop
    target_cls_per_level = get_level_targets() 
    level_first.append(target_per_level.reshape(-1))

will it be more effecient or if i misunderstanding something?

tianzhi0549 commented 4 years ago

@ricky40403 I don't think it will be more efficient because the levels of an object are determined by the distances to all the possible locations. So you have to compute the distances to all levels before assigning objects to levels.

ricky40403 commented 4 years ago

Hi, Tianzhi0549: Thanks for the reply. It seems that it can compute targets level by level with the same policy for levels. I was just not sure that some operations such as expanded_object_sizes_of_interest part and the post-processing part after compute_targets_for_locations can cause inefficient due to the more for loop usage.

Though the method I mentioned seems to need O(level*target_num) to process all the predictions and the targets. :sweat_smile:

Thanks, Ricky

XULU42 commented 4 years ago

Hi, Tianzhi0549 and ricky40403: I get confused about why we should use level first layout instead of image first_layout? We can concat predictions of each levels to one tensor, and after that, there will be no "level" concept. So, the question is, is there any reason we must use level first layout? Looking forward to your reply, thanks.

ricky40403 commented 4 years ago

HI, I think that if you can set the targets corresponding to the right places, it should be fine in both image-first and level-first. But, the field size is matched with different levels of the feature maps and it could calculate the position of the target by using the input feature maps when using level-first. When it comes to image-first, it feels like the pre-set anchors in SSD.

In short, I thought that the reason corresponds to the concept in the paper. And the main reason, we should wait for the author to come to explain. :sweat_smile: :sweat_smile: