williamyang1991 / VToonify

[SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
Other
3.53k stars 446 forks source link

Training VToonify with different weights #30

Closed dongyun-kim-arch closed 1 year ago

dongyun-kim-arch commented 1 year ago

Hello!

Unknown-29

in DualStyleGAN, it is interesting to get diverse images with style modifications. If I want to pick up one specific result in the grid, for example, the image located in 3x3 position, and I know the weight of 18 layers, for instance, [0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 1 1 1 1 1 1 1]. is it possible to train VToonify to create such model?

https://github.com/williamyang1991/DualStyleGAN/blob/96e4b2b148fef53d1ba70f1dcfaa5917bd5316f8/destylize.py#L110

I breifly tried to finetune VToonify with different weight, such as 0.3, 0.5, changing --weight parameter, but it seems it only learnt extrinsic style from the style image...

williamyang1991 commented 1 year ago

Please carefully read README.md and our paper before you train your desired model. Most of your previous issues can easily find the answers from the paper and readme of this project. To avoid keeping encountering problems, the best way is to try to understand the method first.

image

but it seems it only learnt extrinsic style from the style image...

I don't understand it only learnt extrinsic style. What else styles do you want to learn?

dongyun-kim-arch commented 1 year ago

Sorry for bothering you several times, but since it is a very recent model so it is hard to find any additional materials which explain the architecture and details for those who do not have CS background like me.

My question is, in DualStyleGAN, I can get diverse combinations of style+content images with weight adjustment like the grid visualization above. I would like to get the same results with the video format (+maintaining the original video proportion (not square)). so, what I ultimately achieve is to generate the video with different weights like [0.3,0.3,0.3,0.8,0.8,0.8,1.0,1.0,1.0,1.0,1.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5] [0.8,0.8,0.8,0.8,0.8,0.8,1.0,1.0,1.0,1.0,1.0,0.2,0.2,0.2,0.7,0.7,0.7,0.7]

what I did was to train dualstylegan first and pretrain VToonify using --pretrain flag, but I am not sure what should be the next... I can train the model with a certain style id (--style_id) but not sure how I can specify this weight, [0.8,0.8,0.8,0.8,0.8,0.8,1.0,1.0,1.0,1.0,1.0,0.2,0.2,0.2,0.7,0.7,0.7,0.7]

[how to find an ideal model] we can first train a versatile model VToonify-Dsd, and navigate around different styles and degrees. After finding the ideal setting, we can then train the model specialized in that setting for high-quality stylization.

In the training tips, does the versatile model indicate vtoonify_s_d.pt model which is created with --fix_color flag?

+++ get some points... VToonify-Dsd model means VToonify trained on DualStyleGAN-based with fix style and fix degree...

williamyang1991 commented 1 year ago

I see. You can train the model without --fix_degree,--fix_style. This will lead to a versatile model VToonify-Dsd. Then you can test with different style_id, and structural styles. (I don't recommend training without --fix_color, but you can try it to enable color transfer)

In the code, you can manipulate s_w and args.style_degree.

https://github.com/williamyang1991/VToonify/blob/cf993aac7943b74ade4b84645edc771171be6d32/style_transfer.py#L226

You can s_w[:,7:] = s_w[:,7:] * w + exstyle[:,7:] * (1-w) to blend color styles. (e.g., w=[1.0,1.0,1.0,1.0,0.2,0.2,0.2,0.7,0.7,0.7,0.7])

https://github.com/williamyang1991/VToonify/blob/cf993aac7943b74ade4b84645edc771171be6d32/style_transfer.py#L213-L216

Finally, VToonify-Dsd model means VToonify trained on DualStyleGAN-based with different styles and degrees.

dongyun-kim-arch commented 1 year ago

Hi William,

Thank you for writing details. I am still trying to understand how it works, but your description is so helpful. please correct me if I am wrong in some parts.

Based on my code exploration, when training DualStyleGAN, extrinsic style of individual training image was extracted and saved them in exstyle_code.npy. And, we can pinpoint one of the training images to utilize its style and get its style code from saved npy file.

The tricky part for me is that there are two input(?) images, content and style, and two styles, extrinsic and intrinsic, and what I want to do is to mix extrinsic style from style image and intrinsic style from content image with weight adjustment, or mix extrinsic style from content image and intrinsic style from style image. (shown in grid visualization in inference_notebook of DualStyleGAN.)

To do that, style vector, which has 18 length consists of two parts, 0-7 for color related (intrinsic??) layers and 8-18 for extrinsic style layers, is required.

so you are suggestion, s_w[:,8:] = s_w[:,8:] w + exstyle[:,8:] (1-w), means do not control color code, since s_w[:, :7] is color-related layers, but control extrinsic styles.

Finally, to achieve mixing extrinsic and intrinsic, these lines need to be added in line 216 in style_transfer.py? https://github.com/williamyang1991/VToonify/blob/cf993aac7943b74ade4b84645edc771171be6d32/style_transfer.py#L213-L216 s_w[:,:7] = exstyle[:,:7] s_w[:,8:] = s_w[:,8:] w + exstyle[:,8:] (1-w)

It seems like I don't fully understand the meaning of 18 layers... if yes, could you also explain the missing details I haven't catch?

williamyang1991 commented 1 year ago

if you want to preserve your content image color, you use s_w[:, 7:], if you want to mix the color of your content image and the style image, you use s_w[:,7:] w + exstyle[:,7:] (1-w) if you want to mix the structure of your content image and the style image, you use s_w[:,:7] w + exstyle[:,:7] (1-w)

dongyun-kim-arch commented 1 year ago

Awesome. it summarizes what I am missing here. if I want to mix the color and structure of both images, s_w w + exstyle (1-w)

Thank you a lot! All the Best