lucidrains / stylegan2-pytorch

Simplest working implementation of Stylegan2, state of the art generative adversarial network, in Pytorch. Enabling everyone to experience disentanglement
https://thispersondoesnotexist.com
MIT License
3.71k stars 586 forks source link

Generating faces #33

Open oliverguhr opened 4 years ago

oliverguhr commented 4 years ago

Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces, unfortunately, the results are not very convincing. I left all the parameters at there default value and trained over 500000 iterations. After 530000 iterations I stopped the training because the results started to decrease and the discriminator loss was 0 or close to 0.

Here are the results

What would be the best way to improve the results? -Train on high-resolution images -Use different training parameters -Use more images

lucidrains commented 4 years ago

Hi Oliver,

Thanks for trying this replication. I made a small change to the data augmentation that may give you better results. https://github.com/lucidrains/stylegan2-pytorch/commit/f00ba1f50988ca1f7af3075223b0f404218d480a You should try it at a larger resolution if you can. The amount of data should be sufficient.

oliverguhr commented 4 years ago

Thank you very much! I downloaded the full resolution images and started the training with your updates. I post the results as soon as it's ready (+33h).

oliverguhr commented 4 years ago

The results are looking a bit strange. Here is the model output after 195 epoch: Left the model trained on Version 0.4.15 and the small images, right the model trained on version 0.4.16 with the full resolution images.

iter-195

Also, the model output didn't change much from epoch 50 to 195.

lucidrains commented 4 years ago

@oliverguhr Indeed, I tried doing a couple runs on my own dataset (which took an entire day) and confirmed that it stopped learning - perhaps the relaxed data augmentation made it too easy for the discriminator. I brought back the random aspect ratios for now, but a bit closer to a ratio of 1, so it should be less distorted than before. I am doing another training run at the moment to verify that learning is restored. Sorry about that, and I will continue to look into this issue to see how I can fully remove the random aspect ratios.

oliverguhr commented 4 years ago

First of all, thank you for all the work you put in this! I played a bit with the parameters and started a new run with a batch size of 7 and network capacity of 24. This is the maximum that fits in my 11 GB VRAM. It's way slower and still running but the losses are much more stable and the result is looking better. Here is the result after 123k iterations. 123-ema

It trained the previous model with the default batch size of 3. I think that these small batches could be the reason why the model hat problems. I had problems in the past where small batches lead to unstable loss gradients. But since I also changed the network capacity I am not sure, which parameter lead to the improvement.

lucidrains commented 4 years ago

@oliverguhr oh my, that looks great! I have made some further changes, and fully removed the ratio data augmentation on the newest version. Yes, the network capacity linearly corresponds with the number of parameters, and as you know, with deep learning, the bigger the better lol

I will need to look into setting some new defaults for batch size. I agree they are probably not as big as they should be.

oliverguhr commented 4 years ago

I trained the model a while longer (200k) iterations, with the best results at about 160k iterations

161-ema

However, after that, it got only worse and there are still some artefacts in it. Since I want to know which parameter leads to the improvement, I am currently running a second try with the default batch size of 3 and network capacity of 32. Will post some update on that tomorrow. And than retrain with your latest patches.

oliverguhr commented 4 years ago

For me, there was no noticeable difference in the results between a batch size of 3 and a network capacity of 32 and batch size 7 and network capacity of 24. But there is a difference with the newest 0.4.23 version. The model pics up the structure of the faces much quicker and produces better results with fewer iterations. Here is a preview of my current training results from 0 to 160000 iterations in 10000 iteration steps

mr-0-full

lucidrains commented 4 years ago

@oliverguhr it is because, in the latest version, I introduce a hack that is used in BigGAN and StyleGAN, called truncation. What it does is it brings the intermediate style vector closer to its average, cutting out the outlier distributions. this results in general better image quality.

yuanlunxi commented 4 years ago

hi, what does the argment 'fp16' mean? and how to use?

yuanlunxi commented 4 years ago

could you share a pretained model for face?

oliverguhr commented 4 years ago

Hi @yuanlunxi here you can read more about FP16 I did not share my model, because the results are not perfect yet. I don't know what I can expect, but the results looked not as good as what Nvidia published.

crrrr30 commented 4 years ago

In the original implementation (as in https://github.com/nvlabs/stylegan2), the default is to train for 25,000 kimgs, or equivalently 25,000,000 iterations. I believe that this is due to the lack of training. After all, the paper claims to have trained on 8 V100s for as long as a week to yield superior results.

Johnson-yue commented 4 years ago

@oliverguhr Hi, if you have some better result please share here, and Which version of code did you try ? only 0.4.23 ?

Do you try with newst version ?? like 0.14.1 ?

oliverguhr commented 4 years ago

@Johnson-yue I started a new training run with the latest version of the code and it looks promising. I am using two attention layers and a resolution of 128x128.

This is a sample after 472,000 iterations. Way to go until 25 million iterations.

472-ema

Unfortunately, I was not able to start the training using FP16. Apex is running, but at some point, the script fails with a null exception.

Johnson-yue commented 4 years ago

@oliverguhr good result!!

oliverguhr commented 4 years ago

I don't know what happed, but until iteration 682k the results got worse: 682-ema

one(!) iteration later the image looked like this:

683-ema

And after some more iterations, the images went completely dark.

@lucidrains Do you have any idea what happened here? I can provide the models and results if this helps.

gordicaleksa commented 4 years ago

Could anyone provide us with a pre-trained PyTorch model? I assume most people won't bother training their own models and you'd also help save this planet by not allowing everybody to train a model for a week on 1313432 V100 GPUs.

oliverguhr commented 4 years ago

Sorry for the late response. Here is a list of trained models (and some sample results) that you can download:

.config.json

model_203.pt model_203.jpg model_300.pt model_300.jpg model_400.pt model_400.jpg model_500.pt model_500.jpg model_550.pt model_550.jpg model_600.pt model_600.jpg model_650.pt model_650.jpg model_700.pt model_700.jpg model_757.pt model_757.jpg

jomach commented 3 years ago

@oliverguhr which commit were you using to train? I'm trying to load the model you provided but I'm not able to load it into the GAN. Missing some keys on loading the module... "..._blocks.1.1.fn.fn.2.weight", "D_aug.D.attn_blocks.1.1.fn.fn.2.bias", "D_aug.D.final_conv.weight", "D_aug.D.final_conv.bias", "D_aug.D.to_logit.weight", "D_aug.D.to_logit.bias" ..."

oliverguhr commented 3 years ago

@jomach Version 1.2.3 I wonder if this should be part of the config.json.

jomach commented 3 years ago

I think this comes from saving only the dictionary instead of the full model...

@jomach Version 1.2.3 I wonder if this should be part of the config.json.

by bad. Never mind.

WoshiBoluo commented 3 years ago

@jomach Version 1.2.3 I wonder if this should be part of the config.json.

excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...

woctezuma commented 3 years ago

excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...

It is in the first post.

Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,

WoshiBoluo commented 3 years ago

excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...

It is in the first post.

Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,

Are you using multiple GPUs? How long can I run to achieve a good result? I ran the 1000 pictures in ffhq on colab and it took 120 hours to iterate 150,000 times. Is this normal?

woctezuma commented 3 years ago

You can find expected training times for StyleGAN2 here: https://github.com/NVlabs/stylegan2-ada-pytorch#expected-training-time

For 128x128 resolution, with only 1 GPU, you should expect 13 seconds per kimg of training. For full training with the recommended 25000 kimg, that is about 4 days of training (with 24h/day, which you cannot have on Colab).

Moreover, you won't have the same GPU every time on Colab. So if you end up with a bad one, that is more training time.

Finally, it is hard to judge your 150,000 iterations, because you don't mention the batch size, or the kimg/iteration. If you have parameters similar to the ones mentioned in this post, I guess you should have similar results: https://github.com/lucidrains/stylegan2-pytorch/issues/33#issuecomment-604885302

MationPlays commented 2 years ago

excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...

It is in the first post.

Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,

Are you using multiple GPUs? How long can I run to achieve a good result? I ran the 1000 pictures in ffhq on colab and it took 120 hours to iterate 150,000 times. Is this normal?

Do you mean with 1000 pictures 1 kimgs? 1 kimg would be 1000 faked image iterations as I understood, is this true?

Nils306 commented 1 month ago

Sorry for the late response. Here is a list of trained models (and some sample results) that you can download:

.config.json

model_203.pt model_203.jpg model_300.pt model_300.jpg model_400.pt model_400.jpg model_500.pt model_500.jpg model_550.pt model_550.jpg model_600.pt model_600.jpg model_650.pt model_650.jpg model_700.pt model_700.jpg model_757.pt model_757.jpg

Hello, would it still be possible to get your models?

oliverguhr commented 1 month ago

Nope - I deleted these checkpoint.