Improving toonification result

yuval-alaluf / restyle-encoder

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" (ICCV 2021) https://arxiv.org/abs/2104.02699

https://yuval-alaluf.github.io/restyle-encoder/

MIT License

1.03k stars 155 forks source link

Improving toonification result #6

Closed Nerdyvedi closed 3 years ago

Nerdyvedi commented 3 years ago

Hi, I was wondering what can we do to improve the toonification result. I tested with Encoder Bootstrapping method, Using the following command : python scripts/encoder_bootstrapping_inference.py --exp_dir=./toonify --model_1_checkpoint_path=./pretrained/restyle_psp_ffhq_encode.pt --model_2_checkpoint_path=./pretrained/restyle_psp_toonify.pt --data_path=./test/test_A --test_batch_size=1 --test_workers=1 --n_iters_per_batch=1

I get decent results, But would like to make it look more like the input image.

A sample of result I am getting.

emma_stone

yuval-alaluf commented 3 years ago

Hi @Nerdyvedi , If you want the output to look more like the input image, you can try adding more inference steps by increasing n_iters_per_batch.
For example, if you take a look at the images linked in the README, you can see that each additional step moves the output closer to the input image (e.g., adding facial hair, changing head shape, etc).

Nerdyvedi commented 3 years ago

Thanks @yuval-alaluf I increased n_iters_per_batch to 5. Result keeps looking worse :)

emma_stone

yuval-alaluf commented 3 years ago

I would say the later results seem more like the original input, which is what you wanted. For example, you can see that the eyes and head shape change to be more like the input. In my opinion, however, when performing toonification, you do want to see changes between the input and output images. For example, I like how the first result has big eyes, which I think is a nice feature here. You can see that its not perfect though. For example, the encoder tries to add an outline of a shirt that isn't there in the input. Overall though, I think the results are reasonable. If I may, what other aspects of the results do you find poor?

yuval-alaluf commented 3 years ago

I am unsure where you got the above output (I assume you got it using some other method).
On this particular input, it seems like the image you attached above is better than the one generated by ReStyle. On some inputs, ReStyle may perform better than other methods and on other inputs, it may perform worse. It appears that on this particular input, ReStyle simply performs worse. Does that make sense?

Nerdyvedi commented 3 years ago

Okay, Makes sense. Actually the output is using https://github.com/eladrich/pixel2style2pixel Are there some hyperparameters I could change, To get a result like this.

yuval-alaluf commented 3 years ago

There aren't really any parameters in inference that I can think of that will change the result. Since the result you got above was from pSp, what you could try doing is replacing the restyle_psp_ffhq_encode.pt model that is used to initialize the encoder bootstrapping with the psp_ffhq_encode.pt model that is used in pSp. My thinking here is that if we use a different initialization for the bootstrapping, we may get different results. I am not entirely sure if this will change anything, but it is interesting to see.

Note that the current code does not support the change above. However, it is easy to support it. Currently, the initialization is performed using the follow: https://github.com/yuval-alaluf/restyle-encoder/blob/c6453b216ee48afe247ea042fe307dfc82a10d92/scripts/encoder_bootstrapping_inference.py#L120-L127

I believe that all you need to do is change the above lines to:

y_hat, latent = net1.forward(inputs, randomize_noise=False, return_latents=True, resize=opts.resize_outputs)

where now net1 is the pSp encoder. Let me know if something above doesn't make sense.

Nerdyvedi commented 3 years ago

@yuval-alaluf Made the changes . Getting the following error File "./models/psp.py", line 19, in init self.n_styles = int(math.log(self.opts.output_size, 2)) * 2 - 2 AttributeError: 'Namespace' object has no attribute 'output_size'

yuval-alaluf commented 3 years ago

When we load the model we have the following lines: https://github.com/yuval-alaluf/restyle-encoder/blob/c6453b216ee48afe247ea042fe307dfc82a10d92/scripts/encoder_bootstrapping_inference.py#L31-L36

Before line 36, try adding:

if 'output_size' not in opts:
    opts['output_size'] = 1024

and let me know what you get. There may be some other small changes needed :)

Nerdyvedi commented 3 years ago

@yuval-alaluf Okay, now it is able to load the checkpoints. But getting size mismatch error.

size mismatch for input_layer.0.weight: copying a param with shape torch.Size([64, 6, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 3, 3, 3]).

yuval-alaluf commented 3 years ago

Ok this makes sense because pSp uses input_nc of 3 and restyle uses 6. You should play around with how you load net1 and net2 and try to match the parameters accordingly. I apologize, but I will need to come back to this at a later time. If you wish, you can continue playing with it or wait a bit and hopefully I can come back to this soon.

yuval-alaluf commented 3 years ago

I had a few minutes to play around with the code and I was able to make the changes. Since this is a quick hack, I'll upload the file here so you can take a look. We were pretty much missing one line of code. I was curious how initializing with pSp would change the result, so I ran it on your input. Here is the result: stone

I hope the results are more what you're looking for (the middle image is the toonified result). I'd say that this looks better than what ReStyle came up with so it's nice to see that a small change can lead to some improvements on particular inputs. The result is similar to what pSp returned, but I think the results here are more colorful.

Here is the code: encoder_bootstrap_with_psp.txt

P.S. I am not particularly surprised by the results. As we mentioned in the paper, one step of pSp is typically better than one step of ReStyle. Therefore, pSp here seems to provide a better initialization than what we get with ReStyle's FFHQ encoder. I'll consider adding support for both models so people have more flexibility in the initialization.

qo4on commented 3 years ago

(the middle image is the toonified result). I'd say that this looks better than what ReStyle came up with

I think it depends on the input. I tried psp (psp_ffhq_toonify), restyle (restyle_psp_ffhq_encode, restyle_psp_toonify) and restyle with psp (psp_ffhq_encode, restyle_psp_toonify) on the same input. psp and restyle with psp sometimes generate horrible results. Most often restyle does better but it has a reddish spot that leads to different eyes colors. Does anybody know a better solution for face toonification?