About the detail of the ResNet50 backbone and FPN

zyainfal / One-Shot-Face-Swapping-on-Megapixels

One Shot Face Swapping on Megapixels.

Other

314 stars 40 forks source link

About the detail of the ResNet50 backbone and FPN #9

Closed HardboiledHu closed 3 years ago

HardboiledHu commented 3 years ago

Thanks for your work! Your Inference Code is working and I have generated excellent fake face using your code and model. After reading the article and the code of megafs.py, I have some question.

The number of Residual blocks is [7,6,3,2] in the article, but it is [3, 4, 6, 3, 2] in the megafs.py and actually, it is exactly same with ResNet50 network.
About the output of FPN which are p4,p3,p2,p1 from top to bottom. There are 3 ouput accepting the feature from the upper layer in the article, but the number is 2 in the code.
what the meaning of omega_only?
Do you directly resize the 1024x1024 input to 256?
what the meaning of CelebAMask-HQ? Because I didn't see any information of CelebAMask-HQ in the article.

zyainfal commented 3 years ago

[3, 4, 6, 3, 2] is the ResNet50 setting as we used the pretrained model to initialize the backbone. And [7,6,3,2] indicates how many blocks before c1, c2, c3, and c4.
I see the difference. The code is the correct version.
omega_only means the difference between latent space W+(omega only) and W++(omega + constant input of stylegan2).
Yes, this saves GPU memory a lot.
CelebAMask-HQ provides the labeled masks of faces in CelebA-HQ during the inference. We don't use masks in training so it is neglected.

HardboiledHu commented 3 years ago

[3, 4, 6, 3, 2] is the ResNet50 setting as we used the pretrained model to initialize the backbone. And [7,6,3,2] indicates how many blocks before c1, c2, c3, and c4.

I see the difference. The code is the correct version.

omega_only means the difference between latent space W+(omega only) and W++(omega + constant input of stylegan2).

Yes, this saves GPU memory a lot.

CelebAMask-HQ provides the labeled masks of faces in CelebA-HQ during the inference. We don't use masks in training so it is neglected.

But I print the model of HieRFE in inference.py and found the blocks are [3,4,6,3,2]. Shouldn't it be [7.6.3.2]?

zyainfal commented 3 years ago

It should be [3,4,6,3,2] because it is the definition of ResNet50.

For inference, please see the following codes in megafs.py (L119-L213):

def _forward_impl(self, x):
    # See note [TorchScript super()]
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)
    x = self.layer1(x)
    c1 = self.layer2(x)
    c2 = self.layer3(c1)
    c3 = self.layer4(c2)
    if not self.omega_only:
        c4 = self.layer5(c3)
        return c1, c2, c3, c4
    return c1, c2, c3

where [7,6,3,2] means [3+4, 6, 3, 2] denoted by [self.layer1+self.layer2, self.layer3, self.layer4]

HardboiledHu commented 3 years ago

It should be [3,4,6,3,2] because it is the definition of ResNet50.

For inference, please see the following codes in megafs.py (L119-L213):
def _forward_impl(self, x):
    # See note [TorchScript super()]
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)
    x = self.layer1(x)
    c1 = self.layer2(x)
    c2 = self.layer3(c1)
    c3 = self.layer4(c2)
    if not self.omega_only:
        c4 = self.layer5(c3)
        return c1, c2, c3, c4
    return c1, c2, c3
where [7,6,3,2] means [3+4, 6, 3, 2] denoted by [self.layer1+self.layer2, self.layer3, self.layer4]

ok,I got it. The first Residual blocks is layer1+layer2 = 3+4 = 7 blocks.