williamyang1991 / DualStyleGAN

[CVPR 2022] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
Other
1.61k stars 249 forks source link

Custom Dataset Experiment #18

Closed Abhishekvats1997 closed 1 year ago

Abhishekvats1997 commented 2 years ago

Hi, While training all components of dual stylegan on my custom dataset I noticed that the script destylize.py like other scripts does not have the option to pass in custom paths for encoder and ckpt, also the output size of the gan. I did manage to change this according to my case but still had a doubt. My psp encoder was trained with a moco loss rather than ID loss since it is a non human domain. Should, I change that also to moco or will I obtain reasonable style codes with ID loss also.

williamyang1991 commented 2 years ago

I noticed that the script destylize.py like other scripts does not have the option to pass in custom paths for encoder and ckpt

Your can specify the encoder path by modifying args.model_path and 'encoder.pt': https://github.com/williamyang1991/DualStyleGAN/blob/96e4b2b148fef53d1ba70f1dcfaa5917bd5316f8/destylize.py#L110

Specify the stylegan path by modifying args.model_path and 'stylegan2-ffhq-config-f.pt': https://github.com/williamyang1991/DualStyleGAN/blob/96e4b2b148fef53d1ba70f1dcfaa5917bd5316f8/destylize.py#L106


Should I change that also to moco or will I obtain reasonable style codes with ID loss also.

If your dataset is not about human faces, you can set the weight of ID loss to 0, and use your own losses.

It is OK to use your own psp encoder, but my implemtation use Z+ space and the original psp encoder uses W+ space. I'm not sure if my code can be directly applied to W+ psp encoder without modifying the code. One option is to carefully check the code where I use psp style codes (sometimes it is in Z+ space, and sometimes it is in W+ space) and change them all to W+ space. Another option is to train your own psp encoder but in Z+ space. You may refer to https://github.com/williamyang1991/DualStyleGAN/issues/10#issuecomment-1084361451 on how I train psp encoder in Z+ space.

Abhishekvats1997 commented 2 years ago

I'll try option 1 first. So, all I have to do is set z_plus_latent to False wherever I see it right ? Do I also then set input_is_latent to True in these areas ?

williamyang1991 commented 2 years ago

Yes, for W+ intrinsic style code instyle and extrinsic style code exstyle, you need to use

img_gen, _ = generator([instyle], exstyle, input_is_latent=True, z_plus_latent =False, use_res=True)

Note that DualStyleGAN accepts exstyle as W+ latent code by default. So in my implmentation, I will transform exstyle from Z+ to W+ in some places of the code, for example, https://github.com/williamyang1991/DualStyleGAN/blob/96e4b2b148fef53d1ba70f1dcfaa5917bd5316f8/finetune_dualstylegan.py#L524-L527 Since your exstyle is already in W+ space, you need to comment out these lines.

Abhishekvats1997 commented 2 years ago

Since i did not make any changes to destylize.py with regard to z_plus_latent, should I also regenerate my style codes ? and If yes what changes should I make to it. Set z_plus_latent to False everywhere ?

williamyang1991 commented 2 years ago

Since I modify the psp encoder to support Z+, maybe you will need to also change the psp encoder back to its original version in DualStyleGAN/model/encoder/

Abhishekvats1997 commented 2 years ago

Hi, So i did manage to train a Z plus psp and go on with the process. On the final fine-tuning step since my domain is pets and not humans I tried the following 1) turned off ID loss 2) turned on ID loss 3) Use the moco loss instead of ID loss but could not get good results with any I would be really grateful if you could give me any advice on how to go about it.

williamyang1991 commented 2 years ago

What is your source domain and target domain? Human face to Pet face? Maybe the human face and pet face are too different to find the mappings? Maybe you could turn off the ID loss and use a very small learning rate. Could you provide intermediate images like Fig. 6(b) in my paper so that it is possible to figure out in what step the problem occurs.

Abhishekvats1997 commented 2 years ago

Source Domain - Real Pets (Afhq dogs) Target Domain - Toon Pets Example Image image

The domains are definitely not that misaligned.

When you told me to look at 6b I realised that maybe I did something wrong in the process and I think we need repeated training after each stage or something. Steps followed by me 1) Train a Z+ psp encoder which gives good results on my afhq dogs dataset. 2) Finetune stylegan on my toon pets dataset(face cropped and pre-aligned) using a checkpoint trained on afhq by me using rosinality's repo. 3) Train destylizer 4) Pretrain dualstylegan on afhq dogs for 3000 iter 5) Finetune this dualstylegan on my toon dataset

Sorry for bothering you with this but I would be grateful if you could provide me some pointers on how to train this and get results. Some of my samples from steps followed psp image Finetune stylegan Screenshot 2022-04-24 at 8 42 24 PM Screenshot 2022-04-24 at 8 42 37 PM Screenshot 2022-04-24 at 8 42 30 PM I then obtained the style codes by running destylize image image I then pretrain the dualstylegan (on afhq) image Then finaly I finetune dualstylegan and get image

williamyang1991 commented 2 years ago

I then obtained the style codes by running destylize image

I think this result reflects some problems. How many iterations you used to finetune stylegan? It seems that you finetuned stylegan on your cartoon pet dataset for many iterations, therefore, there is few correspondence between the original pet domain and the cartoon pet domain. So the destylization results have no correspondences with the input cartoon pets. I finetune stylegan for only 600 iterations:

python -m torch.distributed.launch --nproc_per_node=8 --master_port=8765 finetune_stylegan.py --iter 600 --batch 4 --ckpt ./checkpoint/stylegan2-ffhq-config-f.pt --style cartoon --augment ./data/cartoon/lmdb/


Then finaly I finetune dualstylegan and get

Since pet faces and human faces are very different, maybe the parameters for human faces do not work for pet faces. Maybe you need to try different combinations of --style_loss, --CX_loss, --perc_loss, --id_loss, --L2_reg_loss, --lr to obtian a good result.