sgrvinod / a-PyTorch-Tutorial-to-Object-Detection

SSD: Single Shot MultiBox Detector | a PyTorch Tutorial to Object Detection
MIT License
3.04k stars 718 forks source link

Bounding Box explaination #31

Closed slmatrix closed 3 years ago

slmatrix commented 4 years ago

But pixel values are next to useless if we don't know the actual dimensions of the image.

Pixel values and also their representation as fractions of the image's dimension are equivalent. That is, they provide the same amount of information.

slmatrix commented 4 years ago

(..) provided that the parameters of the fully connected network (..)

should say layer.

slmatrix commented 4 years ago

conv6 will use 1024 filters, each with dimensions 1, 1, 1024

should say conv7 and dimensions should be 2, 2, 1024 because previous layer is decimated to 3, 3, 512?

AnhPC03 commented 4 years ago

@slmatrix FM10 is 3x3(x256), why FM11 is 1x1(x256)? I think it should be 2x2(x256) because use 2x2 max pooling, and use the mathematical ceiling function.

slmatrix commented 3 years ago

@AnhPC03, there is no max pooling. The auxiliary layers downsample their spatial dims by convolutions with zero padding (side effect of convolution is downsampling which is why padding is typically done in conv layers).

Closing this issue.