mchong6 / JoJoGAN

Official PyTorch repo for JoJoGAN: One Shot Face Stylization
MIT License
1.42k stars 206 forks source link

Apply JoJoGAN on car #27

Open HOKINGLOK opened 2 years ago

HOKINGLOK commented 2 years ago

Hi all, did anyone try to apply JoJoGAN on car images (or other kinds of images)? I tried to replace both the e4e pretrained weight file and StyleGAN2 pretrained weight file with the car-specific one and then finetuned the StyleGAN generator. But the result was not good. It seems that the generator was not finetuned at all...

mchong6 commented 2 years ago

can you share the results? One potential source for bugs is that face stylegan is generating at 1024 resolution while cars I believe is at 512. So you might have to change some variables to ensure the right model is loaded.

I have tried it on churches and the results are fine so I don't see why it wouldn't work on cars.

HOKINGLOK commented 2 years ago

Thanks for your reply. Yes, I think the model and the corresponding weight is loaded without bug. Below is one of my result, the first one is inferencing the finetuned generator and the second one is putting the inversion latent code into the generator without finetune. It seems that the change between them is very slight and the direction of change seems wrong (only the background and ground are updating). I also notice that the loss while finetuning fluctuate a lot even though when I decrease the learning rate to 2e-4. Sometimes the loss decrease to a very low level very fast, which I think may be one reason that causes the generator was not finetuned correctly. image image

mchong6 commented 2 years ago

What do the figures mean? I assume the first image is your style reference, what about the others? What do the inversion of the style reference look like?

HOKINGLOK commented 2 years ago

Yes, the first figure is the style reference, the second one is the test input and the last one is feeding the inversion code of the test input into the fine-tuned generator / original generator. I think we have solved the problem by adjusting discriminator's structure since there was some conflicts between the model and the weights file. But we also find that finetuning a car specific JoJoGAN is harder than a human face one (usually takes more iteration to get the same level of loss). Moreover, the content of the test input is often missed by the inference result. Is this to do with the styleGAN generator?

mchong6 commented 2 years ago

Ah that is right. The discriminator loss function assumes that the image is 1024, I forgot about that, good catch. It seems like the inversion is really bad in your case, not sure why. The inverted car looks nothing like the input car. Poor GAN inversion could be the reason why it takes longer to train and might get poorer results

HOKINGLOK commented 2 years ago

Understood, thank you very much!

dongyun-kim-arch commented 2 years ago

@mchong6 HI! I am also trying to utilize face model with 512x512 size. But, it seems there are some problems I couldn't catch. Could you point which parts I should change to run 512x512 model? Thank you!