yangyanli / DO-Conv

Depthwise Over-parameterized Convolutional Layer
MIT License
197 stars 35 forks source link

Inquiry about non-decreasing loss when using DO-Conv in VGG #22

Closed pikeyang closed 1 year ago

pikeyang commented 1 year ago

Hello,

I have been experimenting with replacing the standard convolutions in the VGG model with DO-Conv (Depthwise Separable Convolution). However, I am facing an issue where the loss does not decrease during training.

Could you please provide some insights on why this might be happening? I have double-checked my implementation and ensured that the replacement was done correctly, but I am still unable to identify the cause of this issue.

Thank you for your assistance.

jinming0912 commented 1 year ago

Firstly, I would like you to check that all of Conv2d has been replaced with DOConv:D Also, may I ask what dataset you are experimenting on?

pikeyang commented 1 year ago

I'm training on cifar100, and DOConv2d from do_conv_pytorch_1_10.py .

code

# load model
    model = models.vgg16(weights=None, num_classes=100)

    idx = 0
    for i, layer in enumerate(model.features):
        if isinstance(layer, nn.Conv2d):
            idx += 1
            if idx <= 2:
                continue
            in_channels = layer.in_channels
            out_channels = layer.out_channels
            kernel_size = layer.kernel_size[0]
            stride = layer.stride[0]
            padding = layer.padding[0]
            dilation = layer.dilation[0]
            model.features[i] = DOConv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups=in_channels)

dataset

  train_data = dataets.CIFAR100(root='./data', train=True, download=True, 
                               transform=transforms.Compose([
                                   transforms.Resize(64),
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.48,0.4593,0.4155),(0.2774,0.2794,0.2794))
                               ]))

setting

learning_rate = 1e-3
epochs = 50
batch_size = 128

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

result

Epoch 4
-------------------------------
loss: 4.638688  [  128/50000]
loss: 4.578317  [12928/50000]
loss: 4.617495  [25728/50000]
loss: 4.629223  [38528/50000]
Test Error: 
 Accuracy: 1.0%, Avg loss: 4.605709 

Epoch 5
-------------------------------
loss: 4.607046  [  128/50000]
loss: 4.612252  [12928/50000]
loss: 4.619832  [25728/50000]
loss: 4.616201  [38528/50000]
Test Error: 
 Accuracy: 1.0%, Avg loss: 4.605627 

Epoch 6
-------------------------------
loss: 4.623777  [  128/50000]
loss: 4.612648  [12928/50000]
loss: 4.607874  [25728/50000]
loss: 4.600666  [38528/50000]
Test Error: 
 Accuracy: 1.0%, Avg loss: 4.605511 

Epoch 7
-------------------------------
loss: 4.615308  [  128/50000]
loss: 4.610726  [12928/50000]
loss: 4.609306  [25728/50000]
loss: 4.620149  [38528/50000]
Test Error: 
 Accuracy: 1.0%, Avg loss: 4.605463
pikeyang commented 1 year ago

I was wondering if you could provide an example code snippet for replacing the convolutions in a classic CNN network structure with DO-Conv. It would be greatly appreciated.

jinming0912 commented 1 year ago

From your code, it looks like you are not using DOConv, you can try to be able to train properly without DOConv first, i.e., baseline method, and then replace Conv2d in the network model with DOConv. We provide examples, please refer to sample_pt.py, lines 224 to 248, that show how to replace DOConv in pytorch.

pikeyang commented 1 year ago

image

this example code does not set the groups, but DO-DConv (groups=in_channels), DO-GConv (otherwise).

jinming0912 commented 1 year ago

We have declared it in the DOConv definition file as follows:"Note that the groups parameter switchs between DO-Conv (groups=1), DO-DConv (groups=in_channels), DO-GConv (otherwise)." The details need to be set according to the network layer you are using. Maybe you can also take a look of our paper, in the section on DO-DConv and DO-GConv, to help you understand how to set it up:D

pikeyang commented 1 year ago

Thank you for your patient reply

pikeyang commented 1 year ago

Can you provide the code to get the experimental results in Table 1?

jinming0912 commented 1 year ago

Our code on the CIFAR dataset can't be found right now, but you can use any open source code to experiment first as a baseline, and if you can reproduce the baseline's performance, replace conv2d directly with doconv to get our results. Our experimental code on the imagenet dataset is from GluonCV (https://cv.gluon.ai/contents.html), and we also provide a doconv implementation of GluonCV, you can also directly replace conv2d for doconv to get our result.

pikeyang commented 1 year ago

Thanks!