Regarding fully connected baseline

nkolot / GraphCMR

Repository for the paper "Convolutional Mesh Regression for Single-Image Human Shape Reconstruction"

BSD 3-Clause "New" or "Revised" License

425 stars 67 forks source link

Regarding fully connected baseline #34

Closed mehtadushy closed 4 years ago

mehtadushy commented 4 years ago

I was unable to find the details for the fully connected mesh regression baseline in the paper. Will you make the code for it available such that the baseline can be further examined?

nkolot commented 4 years ago

Here is the code we used for the baseline.

class BaselineModel(nn.Module):

    def __init__(self, ref_vertices):
        super(BaselineModel, self).__init__()
        self.ref_vertices = ref_vertices
        self.resnet = resnet50(pretrained=True)
        self.shape = nn.Sequential(FCBlock(2048, 2048),
                                   nn.Linear(2048, self.ref_vertices.shape[0] * 3))
        self.camera_fc = nn.Sequential(FCBlock(2048, 1024),
                                      nn.Linear(1024, 3))

    def forward(self, image):
        image_enc = self.resnet(image)
        shape = self.shape(image_enc)
        shape = shape.view(-1, 3, self.ref_vertices.shape[0])
        cam = self.camera_fc(image_enc)
        return shape, cam

mehtadushy commented 4 years ago

Another follow up on this. Do you think this is a fair baseline considering it only has 2 layers and 1 nonlinearity while the GraphCNN has multiple (15-20 ?) layers + access to additional information through the mesh? Or am I missing something?

nkolot commented 4 years ago

Indeed the fully connected baseline is not as deep as the GraphCNN, but for our comparison we took into consideration the number of parameters. For the shape regression, the FC baseline includes about 15 million parameters (20482048 + 2048 (6890/4) 3), compared to about 2.5 million parameters for our Graph-CNN ( 2051512 + 5 blocks (512256+256256+256512) + 6432 + 323). You can make the fully connected baseline deeper, but this would come with the cost of significantly increasing the number of required parameters.

mehtadushy commented 4 years ago

Yes, but the number of parameters is not the same as representational capacity, and number of parameters is not the same as computational cost (and inference time) either. FC layers, despite their huge number of parameters, have a significantly smaller compute cost than, say, convolution layers operating on 2D grids. Further, there exist many ways to trivially or non-trivially compress fully-connected networks, which makes the number of parameters argument moot.

In my opinion, and perhaps you and others would agree, the baselines should be comparable in terms of representational capacity / inference time / compute cost to be meaningful.