impiga / Plain-DETR

[ICCV2023] DETR Doesn’t Need Multi-Scale or Locality Design
MIT License
192 stars 4 forks source link

Question about flatten in HungarianMatcher #18

Closed wsh9352 closed 10 months ago

wsh9352 commented 10 months ago

In models/matcher.py/HungarianMatcher.forward

out_delta = outputs["pred_deltas"].flatten(0, 1)
out_bbox_old = outputs["pred_boxes_old"].flatten(0, 1)

Predictions from different batches are mixed up by flattening them by the first two dims, and are later used to generate deltas together with flattened gt . As a beginner may I ask that is this some kind of trick or the reason causes the drop when larger batch size is set.

impiga commented 10 months ago

Thanks for your interest!

This part of the code is a bit confusing, but I understand it is correct. The operations on lines 119 and 123 will obtain the correct cost matrix. https://github.com/impiga/Plain-DETR/blob/6ad930bb85f5d10417ebe979780132a9a466a8e0/models/matcher.py#L114-L124

I hope the following comments can help.

# Final cost matrix
# Shape: [batch_size * num_queries, sum(num_target_boxes)]
# This matrix contains cost between all predictions and targets in a batch, including those not in the same image.
# The cost between predictions and targets in different images are not used (see below).
C = (
    self.cost_bbox * cost_bbox
    + self.cost_class * cost_class
    + self.cost_giou * cost_giou
)
# Shape: [batch_size, num_queries, sum(num_target_boxes)]
C = C.view(bs, num_queries, -1).cpu()

# sum(num_target_boxes)
sizes = [len(v["boxes"]) for v in targets]
indices = [
    # C.split(sizes, -1) splits C into a list of tensors, each tensor has size [batch_size, num_queries, num_target_boxes]
    # c[i] is the cost matrix between predictions and targets in the same image
    linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))
]