What's the output features shape for regression?

Centers are （N，128，128，80) W and H of Centers are（N，128，128，2) Offset of Centers are（N，128，128，2)

So only one target box can be predicted for the same or different categories with overlapping centers?

For example, a cat and an elephant coincide at the center, but their sizes vary greatly. But the center overlap of a cat and an elephant can only predict one category of elephant or cat, but can not simultaneously predict elephant and cat?