style image pairs num - Githubissues

mchong6 / JoJoGAN

Official PyTorch repo for JoJoGAN: One Shot Face Stylization

MIT License

1.42k stars 206 forks source link

style image pairs num #13

Closed nzhang258 closed 2 years ago

nzhang258 commented 2 years ago

Hi, I have some questions about the number of finetune data pairs. According to stylize.ipynb's part: Finetune StyleGAN, I find the variable "random_alpha" is not be used. If use only one reference style image, then I only have one pair to finetune the styleGAN？Could you plz tell me what am I doing wrong? Thanks a lot.

mchong6 commented 2 years ago

random_alpha was left in the code unintentionally and is not currently used. If you set alpha to 1, there is a line of code that says alpha = 1-alpha which makes alpha a constant 0. Then style mixing will occur and you will have more than 1 pair of data.

nzhang258 commented 2 years ago

Thanks for your reply. There is a line of code about stylemixing:
in_latent[:, id_swap] = alpha latents[:, id_swap] + (1-alpha) mean_w[:, id_swap] img = generator(in_latent, input_is_latent=True) Here, if I have 1 style image, then the shape of 'latents' is (1,18,512), and the shape of 'img' is (1,3,1024,1024), I still have 1 pair of data. Do you mean if I want to have more than 1 pair of data, I need to change the 'id_swap' list?

mchong6 commented 2 years ago

Not sure if you have figured out since you closed this but I figure I will leave an explanation.

id_swap refers to M in the paper, it defines which layers we want to do style mixing. So id_swap is fixed. We perform style mixing in the line you quoted, and for those layers indicated by style mixing, we mix it with mean_w (this is bad naming but mean_w is essentially a randomly sampled vector). So in_latent is a style_mixed version of the original latent code and each iteration you will have a different in_latent.

nzhang258 commented 2 years ago

Not sure if you have figured out since you closed this but I figure I will leave an explanation.

id_swap refers to M in the paper, it defines which layers we want to do style mixing. So id_swap is fixed. We perform style mixing in the line you quoted, and for those layers indicated by style mixing, we mix it with mean_w (this is bad naming but mean_w is essentially a randomly sampled vector). So in_latent is a style_mixed version of the original latent code and each iteration you will have a different in_latent.

OK, I think I understand what you mean~ Thanks very much.