About training details - Githubissues

Hi, I am trying to re-implement your paper but can not get good results on both image and text path. So I would like to verify some implementation details:

Below is my implementation of Modulation Module inside Mapper(in pytorch):


import torch
import torch.nn as nn
import torch.nn.functional as F
from models.stylegan2.model import EqualLinear

class MapperBlock(nn.Module): def init(self, channels=512): super(MapperBlock, self).init() self.fc = EqualLinear(channels,channels) self.f_gamma = nn.Sequential( EqualLinear(channels,channels), nn.LayerNorm(channels), nn.LeakyReLU(0.2), EqualLinear(channels,channels) ) self.f_beta = nn.Sequential( EqualLinear(channels,channels), nn.LayerNorm(channels), nn.LeakyReLU(0.2), EqualLinear(channels,channels) ) self.act = nn.LeakyReLU(0.2)

def modulation(self, x, e):
    gamma = self.f_gamma(e)
    beta = self.f_beta(e)

    # norm x
    x = F.layer_norm(x, (x.shape[-1],))

    # modulation
    return (1.0 + gamma) * x + beta

def forward(self, x, e):
    x = self.fc(x)
    x = self.modulation(x, e)
    return self.act(x)


Is it correct?

2. According to your paper, the reference style/text is randomly set to image or text. My understanding is the image/text manipulation loss is only calculated when image/text reference is used, but the total loss value range is vary in different condition. Does the loss weights always keep the same in all condition or need to adjust for different condition?

3. In your paper: "we also generated several edited images using our text-guided hair editing method to augment the diversity of the
reference image set." Could you elaborate more details about your method? Or any other reference paper?

Thanks for your help.

wty-ustc / HairCLIP

About training details #4