Closed jscarlson closed 3 years ago
For the toonification experiments, I used the pre-trained generator from Justin Pinkney and Doron Adler. They wrote a short paper on how they performed the fine-tuning: https://arxiv.org/abs/2010.05334 Because they use a particular trick for fine-tuning, called layer swapping, the two latent spaces remain aligned. Therefore, when sampling a latent in the FFHQ latent space, it corresponds to a similar looking image in the toonify latent space. I don't have the code for generating Figure 17, but it more less boils down to this idea:
latent = generate_random_latent()
ffhq_image = ffhq_generator(latent)
toon_image = toon_generator(latent)
Hope this helps
Okay, thank you! Didn't realize that the fine-tuning had happened via "layer swapping". Does this mean in general one shouldn't expect encoder bootstrapping to work unless the other generator has been fine-tuned via layer swapping?
Good question. The encoder bootstrapping should work best when two domains are similar, which is the case when you do the layer swapping. In general, the technique will only work if the two domains are aligned. This could also be achieved when you do fine-tuning, for example with StyleGAN-NADA (although I haven't tried this myself). The bootstrapping wouldn't work on something like faces to animals, or something like that. This is because a latent in one generator doesn't correspond to a similar image in the other generator.
Would you be able to provide a few additional details about how the toonify StyleGAN2 generator was fine-tuned to get the output in Figure 17, e.g., did the latent spaces of both generators seem "aligned" for all fine-tuning checkpoints you checked? Would you be able to share the code used for creating Figure 17? Thanks!