rinongal / StyleGAN-nada

http://stylegan-nada.github.io/
MIT License
1.15k stars 146 forks source link

StyleGAN-nada in Kaggle (2x-3x more speed up from free Colab) #37

Closed ratthachat closed 2 years ago

ratthachat commented 2 years ago

Thanks for the great work!

For anyone interested, StyleGAN-nada can now be played on Kaggle notebook with P100 GPU, around 2x-3x speed-up compared to free Colab notebook.

Visit the annotated notebook here: https://www.kaggle.com/ratthachat/stylegan-nada-playground

@rinongal BTW, we also have kaggle badge :) Kaggle

rinongal commented 2 years ago

Hey!

Thanks a lot for settings this up! I'll add the badge to the readme.

I had a quick look and noticed two things that might be worth updating:

1) I see you've got the option to authenticate with PyDrive disabled as the default, and it's not clear from the notebook that this can be changed. I'm not sure if this login works from Kaggle notebooks, but there's a very high chance of running into Google Drive rate limits when downloading ReStyle models if you're not logged in. Might be worth at least mentioning the option.

2) ReStyle e4e tends to perform better than ReStyle pSp for the face editing. I'd swap that to be the default.

ratthachat commented 2 years ago

Hi @rinongal !

About 1. , I could not make self.authenticate() work in Kaggle since it requires colab.auth() which is not available in Kaggle. If anybody know how to make it works, please let me know.

def authenticate(self):
        auth.authenticate_user()
        gauth = GoogleAuth()
        gauth.credentials = GoogleCredentials.get_application_default()
        self.drive = GoogleDrive(gauth)

BTW, I've never got the GDrive rate limits problem so I am unaware of this. Will other people will be fine, if I am fine with this?

EDIT: I've thought about it. Perhaps I will try to port all of the required model weights to Kaggle public dataset so that we could eliminate this problem once and for all; also, hopefully, we don't need TF 1.x for weight-conversion anymore . EDIT2 Success! The current Kaggle notebook Doesn’t need weight downloading anymore for both StyleGAN and ReStyle (except for CLIP which still need to download). Weights can be access with Kaggle server directly: https://www.kaggle.com/ratthachat/stylegan-nada-restyle-weights —- About 2., do I understand correctly that e4e is better for out-of-domain editing while psp maybe better for in-domain editing? If yes, do we have intuitive explanation? (So that I will add to the notebook)

rinongal commented 2 years ago

Hi,

About 1: Sounds great! That's a good way to solve it :) Rate limits usually happen towards the end of the day, when enough people access the drives from anonymous accounts (colabs etc.).

About 2: pSp is better for reconstruction quality (i.e. the inversion looks more like the original image) but performs worse under most forms of editing, including our out-of-domain editing.

The explanation for why this happens is a bit long. The TL;DR is that pSp inverts into more expressive regions of the latent space, but these regions don't behave well for editing.

The more in depth explanation has to do with StyleGAN's multiple latent spaces. Basically, StyleGAN generation starts with a latent code z drawn from a normal distribution. This code is converted (using an MLP called a mapping network) into a new code w in some intermediate space W. This w code enters the network across all different convolutional layers (18 layers in total for an FFHQ 1024x1024 model). The codes in W are the ones that are easiest to edit, since W is more disentangled. Most inversion works, however, use an extended latent space called W+, where a different code from W is used as an input for every layer of the neural network. In this extended space, it's much easier to find a representation for a specific image, so reconstruction accuracy is better. On the other hand, codes in W+ weren't really observed by the generator during training, and the latent space around these regions doesn't really behave as well as it does in 'denser' areas like W, which means editing isn't as good. What e4e does, is try to find codes in W+ that are close to real codes in W. This reduces expressivity so reconstruction suffers, but you get codes which are in much more editable regions of the latent space.

Hope this is clear enough, but if you have more questions, feel free to ask!

ratthachat commented 2 years ago

Thanks @rinongal ! I have added the explanation in the notebook comment section. Also glad that you love the Kaggle dataset. I think we can close this issue :)

Talk to you later!