Closed becauseofAI closed 5 years ago
Both are wrong. Read the code or the paper. You have 3 outputs. Centers are (N,128,128,80).
@see-- Centers are (N,128,128,80) W and H of Centers are(N,128,128,2) Offset of Centers are(N,128,128,2)
So only one target box can be predicted for the same or different categories with overlapping centers?
For example, a cat and an elephant coincide at the center, but their sizes vary greatly. But the center overlap of a cat and an elephant can only predict one category of elephant or cat, but can not simultaneously predict elephant and cat?
The paper has some great sections answering your questions. In short: You are right. But the important part is that it rarely happens. You have much fewer collisions/lost boxes with CenterNet than with any other approach.
Which of the following tensor shape of output features is correct? Take batch=N, input =512 , R=4, C=80 (COCO ) as an example: (N,128,128,80,2,2)? or (N,128,128,80 + 2 + 2) ?