shrubb / box-convolutions

PyTorch code for the "Deep Neural Networks with Box Convolutions" paper
Apache License 2.0
511 stars 35 forks source link

Implementation in VGG #19

Closed Flock1 closed 5 years ago

Flock1 commented 5 years ago

Hey,

I am trying to implement box convolution for HED (Holistically-Nested Edge Detection) which uses VGG architecture. Here's the architecture with box convolution layer:

class HED(nn.Module):
    def __init__(self):
        super(HED, self).__init__()

        # conv1
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1),
            BoxConv2d(1, 64, 5, 5),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, 3, padding=1),
            #BoxConv2d(1, 64, 28, 28),
            nn.ReLU(inplace=True),
        )

        # conv2
        self.conv2 = nn.Sequential(
            nn.MaxPool2d(2, stride=2, ceil_mode=True),  # 1/2
            nn.Conv2d(64, 128, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, 3, padding=1),
            nn.ReLU(inplace=True),
        )

        # conv3
        self.conv3 = nn.Sequential(
            nn.MaxPool2d(2, stride=2, ceil_mode=True),  # 1/4
            nn.Conv2d(128, 256, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, 3, padding=1),
            nn.ReLU(inplace=True),
        )

        # conv4
        self.conv4 = nn.Sequential(
            nn.MaxPool2d(2, stride=2, ceil_mode=True),  # 1/8
            nn.Conv2d(256, 512, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, 3, padding=1),
            nn.ReLU(inplace=True),
        )

        # conv5
        self.conv5 = nn.Sequential(
            nn.MaxPool2d(2, stride=2, ceil_mode=True),  # 1/16
            nn.Conv2d(512, 512, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, 3, padding=1),
            nn.ReLU(inplace=True),
        )

        self.dsn1 = nn.Conv2d(64, 1, 1)
        self.dsn2 = nn.Conv2d(128, 1, 1)
        self.dsn3 = nn.Conv2d(256, 1, 1)
        self.dsn4 = nn.Conv2d(512, 1, 1)
        self.dsn5 = nn.Conv2d(512, 1, 1)
        self.fuse = nn.Conv2d(5, 1, 1)

    def forward(self, x):
        h = x.size(2)
        w = x.size(3)

        conv1 = self.conv1(x)
        conv2 = self.conv2(conv1)
        conv3 = self.conv3(conv2)
        conv4 = self.conv4(conv3)
        conv5 = self.conv5(conv4)

        ## side output
        d1 = self.dsn1(conv1)
        d2 = F.upsample_bilinear(self.dsn2(conv2), size=(h,w))
        d3 = F.upsample_bilinear(self.dsn3(conv3), size=(h,w))
        d4 = F.upsample_bilinear(self.dsn4(conv4), size=(h,w))
        d5 = F.upsample_bilinear(self.dsn5(conv5), size=(h,w))

        # dsn fusion output
        fuse = self.fuse(torch.cat((d1, d2, d3, d4, d5), 1))

        d1 = F.sigmoid(d1)
        d2 = F.sigmoid(d2)
        d3 = F.sigmoid(d3)
        d4 = F.sigmoid(d4)
        d5 = F.sigmoid(d5)
        fuse = F.sigmoid(fuse)

        return d1, d2, d3, d4, d5, fuse

I get the following error: RuntimeError: BoxConv2d: all parameters must have as many rows as there are input channels (box_convolution_forward at src/box_convolution_interface.cpp:30)

Can you help me with this?

shrubb commented 5 years ago

You are using wrong values to initialize BoxConv2d. I've documented the constructor, please run help(BoxConv2d) to see docstrings.

Flock1 commented 5 years ago

Hi,

Here's what it says:

Input : `(batch_size) x (in_planes) x (h) x (w)`

As you can see above, the box convolution layer has the same number of channels as convolution layer, the batch size is one (as in the mnist code). I've tried different values for w and h but I get the same error.

What do you suggest?

shrubb commented 5 years ago

the box convolution layer has the same number of channels as convolution layer

batch size is one (as in the mnist code)

You seem to be confusing batch size, number of input channels and number of output channels.

I urge you to read carefully the rest of help(BoxConv2d). It describes all constructor parameters. in_planes and num_filters are strictly defined there.

Flock1 commented 5 years ago

Hi,

it seems to have started working when I set the following parameters:

nn.Conv2d(3, 64, 3, padding=1),
BoxConv2d(64, 1, 3, 3)

According to the constructor, the following are the arguments:

in_planes: int
 |      Number of channels in the input image (as in Conv2d).
 |  num_filters: int
 |      Number of filters to apply per channel (as in depthwise Conv2d).

According to the architecture, the Conv2D input channel is 3 and output channel is 64. So is this why the input channel to box concolution layer is 64? Since we have added this to the nn.Sequential, so the output of the Conv2D will go into BoxConv2D.

Am I thinking correctly here?

Also, here are the other things I have tried:

1) I tried to change the arguments for BoxConv2D to (64, 3, 3, 3). I get the following: RuntimeError: Given groups=1, weight of size [64, 64, 3, 3], expected input[1, 192, 105, 252] to have 64 channels, but got 192 channels instead

This is probably becase the output channels of BoxConv2D layer is 192 but the next Conv2D layer expects 64 channels.

2) Something like your mnist.py

# conv1
        self.conv_B = BoxConv2d(64, 1, 3, 3)
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, 3, padding=1),
            #BoxConv2d(1, 64, 28, 28),
            nn.ReLU(inplace=True),
        )

And then conv1 = self.conv1(self.conv_B(x)) I again get the error related to parameters. So is there any way to set parameters so that it works for two convolution layers? Or do I have to set parameters for every layer?

Flock1 commented 5 years ago

Here's one more thing I've tried:

# conv1
        self.conv_B = BoxConv2d(3, 64, 3, 3)
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, 3, padding=1),
            #BoxConv2d(1, 64, 28, 28),
            nn.ReLU(inplace=True),
        )

I get the following: RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[1, 192, 252, 105] to have 3 channels, but got 192 channels instead

shrubb commented 5 years ago

Maybe you mistakenly switched order and actually meant to do this : conv1 = self.conv_B(self.conv1(x))

Flock1 commented 5 years ago

I tried that as well. I'm getting the same parameters error

shrubb commented 5 years ago

Sorry, the number of channels is really trivial to balance. You only have to

in_planes same as in Conv2d, num_filters is how many times the number of channels will grow.

Flock1 commented 5 years ago

Hi,

it seems to have started working when I set the following parameters:

nn.Conv2d(3, 64, 3, padding=1),
BoxConv2d(64, 1, 3, 3)

According to the constructor, the following are the arguments:

in_planes: int
 |      Number of channels in the input image (as in Conv2d).
 |  num_filters: int
 |      Number of filters to apply per channel (as in depthwise Conv2d).

According to the architecture, the Conv2D input channel is 3 and output channel is 64. So is this why the input channel to box concolution layer is 64? Since we have added this to the nn.Sequential, so the output of the Conv2D will go into BoxConv2D.

Am I thinking correctly here?

Also, here are the other things I have tried:

  1. I tried to change the arguments for BoxConv2D to (64, 3, 3, 3). I get the following: RuntimeError: Given groups=1, weight of size [64, 64, 3, 3], expected input[1, 192, 105, 252] to have 64 channels, but got 192 channels instead

This is probably becase the output channels of BoxConv2D layer is 192 but the next Conv2D layer expects 64 channels.

  1. Something like your mnist.py
# conv1
        self.conv_B = BoxConv2d(64, 1, 3, 3)
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, 3, padding=1),
            #BoxConv2d(1, 64, 28, 28),
            nn.ReLU(inplace=True),
        )

And then conv1 = self.conv1(self.conv_B(x)) I again get the error related to parameters. So is there any way to set parameters so that it works for two convolution layers? Or do I have to set parameters for every layer?

Thanks. Also, am I unbderstanding the functioning properly here?

shrubb commented 5 years ago

If you mean this, then yes, it's correct:

According to the architecture, the Conv2D input channel is 3 and output channel is 64. So is this why the input channel to box concolution layer is 64? Since we have added this to the nn.Sequential, so the output of the Conv2D will go into BoxConv2D.

Am I thinking correctly here?