Open oliverguhr opened 4 years ago
Hi Oliver,
Thanks for trying this replication. I made a small change to the data augmentation that may give you better results. https://github.com/lucidrains/stylegan2-pytorch/commit/f00ba1f50988ca1f7af3075223b0f404218d480a You should try it at a larger resolution if you can. The amount of data should be sufficient.
Thank you very much! I downloaded the full resolution images and started the training with your updates. I post the results as soon as it's ready (+33h).
The results are looking a bit strange. Here is the model output after 195 epoch: Left the model trained on Version 0.4.15 and the small images, right the model trained on version 0.4.16 with the full resolution images.
Also, the model output didn't change much from epoch 50 to 195.
@oliverguhr Indeed, I tried doing a couple runs on my own dataset (which took an entire day) and confirmed that it stopped learning - perhaps the relaxed data augmentation made it too easy for the discriminator. I brought back the random aspect ratios for now, but a bit closer to a ratio of 1, so it should be less distorted than before. I am doing another training run at the moment to verify that learning is restored. Sorry about that, and I will continue to look into this issue to see how I can fully remove the random aspect ratios.
First of all, thank you for all the work you put in this! I played a bit with the parameters and started a new run with a batch size of 7 and network capacity of 24. This is the maximum that fits in my 11 GB VRAM. It's way slower and still running but the losses are much more stable and the result is looking better. Here is the result after 123k iterations.
It trained the previous model with the default batch size of 3. I think that these small batches could be the reason why the model hat problems. I had problems in the past where small batches lead to unstable loss gradients. But since I also changed the network capacity I am not sure, which parameter lead to the improvement.
@oliverguhr oh my, that looks great! I have made some further changes, and fully removed the ratio data augmentation on the newest version. Yes, the network capacity linearly corresponds with the number of parameters, and as you know, with deep learning, the bigger the better lol
I will need to look into setting some new defaults for batch size. I agree they are probably not as big as they should be.
I trained the model a while longer (200k) iterations, with the best results at about 160k iterations
However, after that, it got only worse and there are still some artefacts in it. Since I want to know which parameter leads to the improvement, I am currently running a second try with the default batch size of 3 and network capacity of 32. Will post some update on that tomorrow. And than retrain with your latest patches.
For me, there was no noticeable difference in the results between a batch size of 3 and a network capacity of 32 and batch size 7 and network capacity of 24. But there is a difference with the newest 0.4.23 version. The model pics up the structure of the faces much quicker and produces better results with fewer iterations. Here is a preview of my current training results from 0 to 160000 iterations in 10000 iteration steps
@oliverguhr it is because, in the latest version, I introduce a hack that is used in BigGAN and StyleGAN, called truncation. What it does is it brings the intermediate style vector closer to its average, cutting out the outlier distributions. this results in general better image quality.
hi, what does the argment 'fp16' mean? and how to use?
could you share a pretained model for face?
Hi @yuanlunxi here you can read more about FP16 I did not share my model, because the results are not perfect yet. I don't know what I can expect, but the results looked not as good as what Nvidia published.
In the original implementation (as in https://github.com/nvlabs/stylegan2), the default is to train for 25,000 kimgs, or equivalently 25,000,000 iterations. I believe that this is due to the lack of training. After all, the paper claims to have trained on 8 V100s for as long as a week to yield superior results.
@oliverguhr Hi, if you have some better result please share here, and Which version of code did you try ? only 0.4.23 ?
Do you try with newst version ?? like 0.14.1 ?
@Johnson-yue I started a new training run with the latest version of the code and it looks promising. I am using two attention layers and a resolution of 128x128.
This is a sample after 472,000 iterations. Way to go until 25 million iterations.
Unfortunately, I was not able to start the training using FP16. Apex is running, but at some point, the script fails with a null exception.
@oliverguhr good result!!
I don't know what happed, but until iteration 682k the results got worse:
one(!) iteration later the image looked like this:
And after some more iterations, the images went completely dark.
@lucidrains Do you have any idea what happened here? I can provide the models and results if this helps.
Could anyone provide us with a pre-trained PyTorch model? I assume most people won't bother training their own models and you'd also help save this planet by not allowing everybody to train a model for a week on 1313432 V100 GPUs.
Sorry for the late response. Here is a list of trained models (and some sample results) that you can download:
model_203.pt model_300.pt model_400.pt model_500.pt model_550.pt model_600.pt model_650.pt model_700.pt model_757.pt
@oliverguhr which commit were you using to train? I'm trying to load the model you provided but I'm not able to load it into the GAN. Missing some keys on loading the module... "..._blocks.1.1.fn.fn.2.weight", "D_aug.D.attn_blocks.1.1.fn.fn.2.bias", "D_aug.D.final_conv.weight", "D_aug.D.final_conv.bias", "D_aug.D.to_logit.weight", "D_aug.D.to_logit.bias" ..."
@jomach Version 1.2.3 I wonder if this should be part of the config.json.
I think this comes from saving only the dictionary instead of the full model...
@jomach Version 1.2.3 I wonder if this should be part of the config.json.
by bad. Never mind.
@jomach Version 1.2.3 I wonder if this should be part of the config.json.
excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...
excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...
It is in the first post.
Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,
excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...
It is in the first post.
Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,
Are you using multiple GPUs? How long can I run to achieve a good result? I ran the 1000 pictures in ffhq on colab and it took 120 hours to iterate 150,000 times. Is this normal?
You can find expected training times for StyleGAN2 here: https://github.com/NVlabs/stylegan2-ada-pytorch#expected-training-time
For 128x128 resolution, with only 1 GPU, you should expect 13 seconds per kimg of training. For full training with the recommended 25000 kimg, that is about 4 days of training (with 24h/day, which you cannot have on Colab).
Moreover, you won't have the same GPU every time on Colab. So if you end up with a bad one, that is more training time.
Finally, it is hard to judge your 150,000 iterations, because you don't mention the batch size, or the kimg/iteration. If you have parameters similar to the ones mentioned in this post, I guess you should have similar results: https://github.com/lucidrains/stylegan2-pytorch/issues/33#issuecomment-604885302
excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...
It is in the first post.
Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,
Are you using multiple GPUs? How long can I run to achieve a good result? I ran the 1000 pictures in ffhq on colab and it took 120 hours to iterate 150,000 times. Is this normal?
Do you mean with 1000 pictures 1 kimgs? 1 kimg would be 1000 faked image iterations as I understood, is this true?
Sorry for the late response. Here is a list of trained models (and some sample results) that you can download:
model_203.pt model_300.pt model_400.pt model_500.pt model_550.pt model_600.pt model_650.pt model_700.pt model_757.pt
Hello, would it still be possible to get your models?
Nope - I deleted these checkpoint.
Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces, unfortunately, the results are not very convincing. I left all the parameters at there default value and trained over 500000 iterations. After 530000 iterations I stopped the training because the results started to decrease and the discriminator loss was 0 or close to 0.
Here are the results
What would be the best way to improve the results? -Train on high-resolution images -Use different training parameters -Use more images