Open becauseofAI opened 5 years ago
Centers are (N,128,128,80) W and H of Centers are(N,128,128,2) Offset of Centers are(N,128,128,2)
So only one target box can be predicted for the same or different categories with overlapping centers?
For example, a cat and an elephant coincide at the center, but their sizes vary greatly. But the center overlap of a cat and an elephant can only predict one category of elephant or cat, but can not simultaneously predict elephant and cat?
Which of the following tensor shape of output features is correct? Take batch=N, input =512 , R=4, C=80 (COCO ) as an example: (N,128,128,80,2,2)? or (N,128,128,80 + 2 + 2) ?