qfgaohao / pytorch-ssd

MobileNetV1, MobileNetV2, VGG based SSD/SSD-lite implementation in Pytorch 1.0 / Pytorch 0.4. Out-of-box support for retraining on Open Images dataset. ONNX and Caffe2 support. Experiment Ideas like CoordConv.
https://medium.com/@smallfishbigsea/understand-ssd-and-implement-your-own-caa3232cd6ad
MIT License
1.39k stars 530 forks source link

evaluation of ssd_mobilenetv1_coco #32

Open kamauz opened 5 years ago

kamauz commented 5 years ago

I'm trying to import the ssd_mobilenetv1_coco_2018, converting it from Tensotflow (.pb) to pytorch (.pth). After the conversion, I wanted to evaluate it with the webcam input but I noticed that there is a mismatch between some layer settings in the SSD class and the pretrained model corresponding to the last EXTRA conv layers/classification_headers/regression_headers.

I had to edit your code in the create_mobilenetv1_ssd like this


extras = ModuleList([
        Sequential(
            Conv2d(in_channels=1024, out_channels=256, kernel_size=1),
            ReLU(),
            Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=2, padding=1),
            ReLU()
        ),
        Sequential(
            Conv2d(in_channels=512, out_channels=128, kernel_size=1),
            ReLU(),
            Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2, padding=1),
            ReLU()
        ),
        Sequential(
            Conv2d(in_channels=256, out_channels=128, kernel_size=1),
            ReLU(),
            Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2, padding=1),
            ReLU()
        ),
        Sequential(
            Conv2d(in_channels=256, out_channels=64, kernel_size=1),
            ReLU(),
            Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=2, padding=1),
            ReLU()
        )
    ])

    regression_headers = ModuleList([
        Conv2d(in_channels=512, out_channels=3 * 4, kernel_size=1, padding=1),
        Conv2d(in_channels=1024, out_channels=6 * 4, kernel_size=1, padding=1),
        Conv2d(in_channels=512, out_channels=6 * 4, kernel_size=1, padding=1),
        Conv2d(in_channels=256, out_channels=6 * 4, kernel_size=1, padding=1),
        Conv2d(in_channels=256, out_channels=6 * 4, kernel_size=1, padding=1),
        Conv2d(in_channels=128, out_channels=6 * 4, kernel_size=1, padding=1), 
    ])

    classification_headers = ModuleList([
        Conv2d(in_channels=512, out_channels=3 * num_classes, kernel_size=1, padding=1),
        Conv2d(in_channels=1024, out_channels=6 * num_classes, kernel_size=1, padding=1),
        Conv2d(in_channels=512, out_channels=6 * num_classes, kernel_size=1, padding=1),
        Conv2d(in_channels=256, out_channels=6 * num_classes, kernel_size=1, padding=1),
        Conv2d(in_channels=256, out_channels=6 * num_classes, kernel_size=1, padding=1),
        Conv2d(in_channels=128, out_channels=6 * num_classes, kernel_size=1, padding=1), 
    ])

This caused the execution of run_converted_pytorch_ssd_live_demo.py to crash with this error: RuntimeError: The size of tensor a (2781) must match the size of tensor b (3000) at non-singleton dimension 1

Is it possible that the SSD mobilenet architecture has been modified in time and some new adjustments have to be made in order to keep the code correct? Or it's just me that I'm missing something?

Thanks

qfgaohao commented 5 years ago

hi @kamauz , the priors/anchors needed in your model and the way of branching out paths to detection head might be different.

notabigfish commented 5 years ago

@kamauz Hi, do you find the solution? I also changed the network structure and faced the same problem as yours.

notabigfish commented 5 years ago

@qfgaohao Hi, in this situation, should I change the parameters of SSDSpec in the config file? Thank you!

kamauz commented 5 years ago

@qfgaohao I ve abandoned this repository time ago for this problem. I remember that I tried to change SSDSpec but I couldn't make it work. I don't exclude that the solution was there by the way. It is tricky maybe

qfgaohao commented 5 years ago

@kamauz @notabigfish you can also change the number of channels of extra layers to make the network output is consistent with the generated anchors.

notabigfish commented 5 years ago

Thank you for the answers!! Actually, similar to @kamauz , I delete all the BatchNorm layers after Pointwise Conv layer and got location with size [..., 1434] but priors with size [..., 3000]. The reason is that the first feature map size in vision/ssd/ssd.py line 58 is [1, 576, 10, 10], but it should be [1, 576, 19, 19]. So I changed line 14 in vision/ssd/config/mobilenetv1_ssd_config.py to

SSDSpec(10, 16, SSDBoxSizes(60, 105), [2, 3])

Then the error that @kamauz mentioned is fixed. However, a new error shows up:

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimention 1. Got 41 and 47 in dimension 0 at /pytorch/aten/src/TH/generic/THTensor.cpp:711

It turns out after changing the configuration, some labels got size torch.Size([0]).

I did not change any convolution layer but only deleted the batchnorm layer. So in theory the output channels are unchanged, right? Or maybe I miss something? Thank you!! @qfgaohao

kamauz commented 4 years ago

@notabigfish Are you trying in the same situation that I tried? So with the PTH file got from a conversion of the official tensorflow model? By the way months ago I assumed that maybe Google developers experimented a different version with respect to the official SSD paper. I don't know if removing the BatchNorm could be a good idea. Maybe it's a matter of implemented choices that we can't know unless we can see how they actually trained the network. Keep me updated if you find a solution