Converting Image to Latent Z Space

bitanath commented 4 years ago

Hi @rosasalberto !

Thanks for this awesome library. I was looking through the implementation, and a little bit like this stylegan encoder, was experimenting to build my own in order to embed images to latent space. While the original code does it on W latents, I was wondering if it might be easier even (given the more disentangled Z space) to convert to Z latents.

Any hints/experimentation on the same?

rosasalberto commented 4 years ago

Hi @bitanath !

Actually, the latent space W is more disentangled. Personally, I think there are 2 options here:

Embed the images to W1 (which is the 1rst layer of the latent space W) and then broadcast the result to the other 17 layers.
Embed directly to W18 (the 18 layers of the latent space W)

bitanath commented 4 years ago

Thanks a lot for the suggestion @rosasalberto ,

I'm in the process of retraining a custom VGG19 Keras model to predict latent vector W1 on your code, and will submit a PR if that succeeds.

Currently results are bad, with MAE of 0.8 and MSE of 0.99, pretty bad considering the W1 vector appears to be normally distributed with Sigma ~ 1.0

Thanks again for the awesome implementation! And please do let me know if you have any hints/suggestions for the same :)

rosasalberto commented 4 years ago

Thanks @bitanath !

If you want to do an initializer for W1, I recommend 2 options:

Inverse mapping from X to W1

Generate a dataset of random Z latents
Input this Z latents dataset to the mapping network and the generator network to obtain W1 (keep only the 1rst layer of the output) and X.
Train an imagenet model with few modifications to predict W1 given X

Use an encoder-decoder architecture (You can also do W18 here)

Use an imagenet model as an encoder, where you input the image you want to embed and outputs a vector corresponding to W1 (and then you broadcast the results, or you can output directly to W18).
Use the stylegan2 generator as the decoder, and input the encoder output to produce a generated image. Train only the encoder, using the reconstruction loss and the perceptual loss.

bitanath commented 4 years ago

Thanks again. I'll try and figure out your solutions!

bitanath commented 4 years ago

Hey @rosasalberto I am reopening this since I finally figured it out! (Whew)

I used the default Keras VGG16 model and removed the last couple of layers in order to generate a vector which represents the style of the image
I then used the FFHQ dataset image aligner as in https://github.com/Puzer/stylegan-encoder (This is very important and improves by an order of magnitude the output) to generate an aligned image from an existing image
I then utilized the mapping network to go W1>>random image, and used that as a starting point, with an optimizer and a decayed learning rate, along with a mean and logcosh loss in order to find differences in style (This is very similar to the Puzer implementation, except that I couldn't understand most of the code there, and implemented it much simpler and cleaner)
After a lot of tweaking I finally figured out how to update Gradients on the W1 space using GradientTape (man the documentation really sucks!), and just ran an Adam optmizer, woot.
I then just ran a loop for a bunch of steps until I found the closest latent and stored it. I also used ImageIO to generate a GIF of the process

LMK if this sounds interesting and I will submit a PR for the code. Would also appreciate a code review since I am not technically a coder as part of my day job :). I used my local system for optimization and found the images to converge in about 500 epochs or so.

Thanks again for the awesome library, it is really simple and clean.

rosasalberto commented 4 years ago

Hi @bitanath, Thanks! It seems you have been working hard on this initializer. Do the PR and I will take a look at the code before merging :)

bitanath commented 4 years ago

Yeah I've submitted the PR, the picture quality isn't all there yet, so I experimented with a few more settings etc. Do let me know what you think @rosasalberto

oss-roettger commented 4 years ago

Hi Alberto and @bitanath, I have also developed an encoder for Alberto's outstanding & concise StyleGAN2 implementation: https://github.com/oss-roettger/HR_Encoder Like @bitanth I have experimented with the VGG model first to preserve high level image features during encoding - with small success. So I've chosen a completely new approach: w(18) optimization, intermediate layers of StyleGAN2 discriminator model itself(!), w-regularization. Results look naturally & sharp and the algorithm is short & clear! I'm keen to compare it with @bitanath's approach. Kind regards from Munich Hans

rosasalberto / StyleGAN2-TensorFlow-2.x

Converting Image to Latent Z Space #3