sgrvinod / a-PyTorch-Tutorial-to-Object-Detection

SSD: Single Shot MultiBox Detector | a PyTorch Tutorial to Object Detection
MIT License
3.04k stars 718 forks source link

PredictionConvolutions separates locs and classes, doesn't follow the paper? #87

Open hrbigelow opened 2 years ago

hrbigelow commented 2 years ago

I noticed in the paper each Prediction Convolution is formulated with output channels = n_boxes * (n_classes + 4), but in the code you have separated each level into separate convolutions.

        self.loc_conv4_3 = nn.Conv2d(512, n_boxes['conv4_3'] * 4, kernel_size=3, padding=1)
        self.loc_conv7 = nn.Conv2d(1024, n_boxes['conv7'] * 4, kernel_size=3, padding=1)
        self.loc_conv8_2 = nn.Conv2d(512, n_boxes['conv8_2'] * 4, kernel_size=3, padding=1)
        self.loc_conv9_2 = nn.Conv2d(256, n_boxes['conv9_2'] * 4, kernel_size=3, padding=1)
        self.loc_conv10_2 = nn.Conv2d(256, n_boxes['conv10_2'] * 4, kernel_size=3, padding=1)
        self.loc_conv11_2 = nn.Conv2d(256, n_boxes['conv11_2'] * 4, kernel_size=3, padding=1)

        # Class prediction convolutions (predict classes in localization boxes)
        self.cl_conv4_3 = nn.Conv2d(512, n_boxes['conv4_3'] * n_classes, kernel_size=3, padding=1)
        self.cl_conv7 = nn.Conv2d(1024, n_boxes['conv7'] * n_classes, kernel_size=3, padding=1)
        self.cl_conv8_2 = nn.Conv2d(512, n_boxes['conv8_2'] * n_classes, kernel_size=3, padding=1)
        self.cl_conv9_2 = nn.Conv2d(256, n_boxes['conv9_2'] * n_classes, kernel_size=3, padding=1)
        self.cl_conv10_2 = nn.Conv2d(256, n_boxes['conv10_2'] * n_classes, kernel_size=3, padding=1)
        self.cl_conv11_2 = nn.Conv2d(256, n_boxes['conv11_2'] * n_classes, kernel_size=3, padding=1)...

But, I believe if it were implemented as in the paper, it should be:

        self.conv4_3 = nn.Conv2d(512, n_boxes['conv4_3'] * (4 + n_classes), kernel_size=3, padding=1)
        self.conv7 = nn.Conv2d(1024, n_boxes['conv7'] * (4 + n_classes), kernel_size=3, padding=1)
        self.conv8_2 = nn.Conv2d(512, n_boxes['conv8_2'] * (4 + n_classes), kernel_size=3, padding=1)
        self.conv9_2 = nn.Conv2d(256, n_boxes['conv9_2'] * (4 + n_classes), kernel_size=3, padding=1)
        self.conv10_2 = nn.Conv2d(256, n_boxes['conv10_2'] * (4 + n_classes), kernel_size=3, padding=1)
        self.conv11_2 = nn.Conv2d(256, n_boxes['conv11_2'] * (4 + n_classes), kernel_size=3, padding=1)

Did you try it the original way, or was this an intentional choice for some reason?

Thank you!