Hello, could you please explain the shape of the output of neck(). 384,248,216 doe this mean each voxel (248 in y and 216 in x have 384 associated features?)
In paper, Figure 2. Network overview. After input feature is extracted feature by backbone, the scale of feature map is (batch_size, 6C, W/2, H/2). And the C is 64, so 64*6 = 384.
Hello, could you please explain the shape of the output of neck(). 384,248,216 doe this mean each voxel (248 in y and 216 in x have 384 associated features?)